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OPTIMAL  AND  ROBUST  MEMORYLESS  DISCRIMINATION 
FROM  DEPENDENT  OBSERVATIONS 


CHAPTER  1 


INTRODUCTION 


In  this  thesis  we  will  consider  various  special  cases  of  the  binary  hypothesis  testing 
problem,  which  may  be  informally  described  as  follows:  One  observes  some  random  event 
and  wishes  to  decide  on  the  basis  of  the  observation  between  two  hypotheses  which 
concern  the  nature  of  the  random  event.  In  all  cases  that  we  will  consider,  the  random 
event  is  modeled  by  a  discrete*time  random  process  {XJ  and  the  two  hypotheses  concern 
the  probability  distribution  of  the  process.  More  specifically,  we  consider  the  following 
two  hypotheses: 


Ho :  {A’j}Jl_1  has  a  density  ^n)(x) 

H\i  has  a  density  /{n)(x) 


(1.1) 


where  x  denotes  the  n-tuple  (x1,...,zn).  A  decision  rule  for  this  hypothesis  testing 
problem  is  a  measurable  mapping  d  which  maps  the  observation  space  Rn  into  the  set 
{0,1},  the  interpretation  being  that  if  x  is  observed  and  d(x)  =  i ,  then  one  decides  that 
Hi  is  true.  Such  a  mapping  may  also  be  referred  to  in  this  thesis  as  a  test,  a  receiver, 
or  a  discriminator.  If  fjn *  is  absolutely  continuous  with  respect  to  fgnl  in  the  sense 
that  /gn)(x)  =  0  ^  /{n*(x)  =  0,  then  the  optimal  decision  rule  for  the  Bayesian  or 
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Neyman- Pearson  criterion  [8]  is  given  by  the  likelihood  ratio  test  (LRT): 


<f(x)  =  1  iff  >  1 

j4*’w  ' 

where  x  is  the  observed  vector,  and  the  choice  of  the  threshold  q  depends  more  specifically 
on  the  particular  criterion  and  the  details  of  the  problem.  If  the  observed  random  process 
is  independent  and  identically  distributed  (iid)  under  either  hypothesis,  then  only  the 
marginal  densities  are  involved,  and  the  LRT  becomes 


d(x)=  1  iff 


■nr  /i(*«) 

ii  /o(*o 


>  n 


where  f0  is  the  marginal  density  under  Ho  and  f\  is  the  marginal  density  under  H\. 
Taking  logarithms  on  each  side,  we  may  also  write  this  in  the  form 


d(x)  =1  iff  E“°s 


fl(Xi) 

/o(*i) 


>  log  T). 


(1-2) 


Because  of  the  simple  form  of  the  LRT  given  by  (1.2),  this  result  has  proven  to 
be  extremely  useful  in  a  practical  sense  whenever  the  processes  can  be  assumed  to  be 
iid.  However,  if  at  least  one  of  the  processes  is  assumed  to  be  dependent,  the  LRT  might 
be  of  little  practical  value.  Consider  two  such  situations.  First,  it  may  be  the  case  that 
one  of  the  n-dimensional  densities  involved  lacks  a  closed  form  expression,  so  that  the 
LRT  also  lacks  a  closed  form  expression.  Such  is  the  generally  the  case  for  the  Rayleigh 
distribution  in  Appendix  A.  In  this  situation  the  LRT  is  not  implementable.  Second,  one 
may  wish  to  implement  a  test  "’hich  does  not  require  an  assumption  on  the  particular 
form  of  the  n-dimensional  densities,  such  as  is  required  for  the  LRT.  For  example,  if  one 
wants  to  design  a  test  based  on  experimental  data,  then  it  is  desirable  to  base  such  a  test 
on  the  empirical  marginal  densities  and  possibly  some  of  the  lower  order  moments,  since 
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the  amount  of  data  necessary  to  establish  the  n-dimensional  empirical  densities  might  be 
rather  unmanageable.  Since  the  LRT  requires  explicit  knowledge  of  the  n-dimensional 
densities,  it  is  clear  that  the  LRT  is  inappropriate  in  this  case. 

In  the  event  that  one  chooses  not  to  use  an  LRT,  he  may  proceed  by  specifying  a 
test  structure  and  then  attempt  to  determine  the  optimal  test  out  the  class  of  all  tests 
which  have  that  structure.  A  test  structure  which  has  appeared  often  in  the  literature 
is  the  following: 

d(x)  =  1  iff  Tn(x)eA  (1.3) 

where 

n 

=  (l-4) 

i»l 

Here  4>  is  a  Borel  measurable  function  which  will  often  be  referred  to  as  a  nonlinearity, 
and  A  is  a  Borel  subset  of  the  real  line  which  will  be  called  the  critical  region.  The  test 
statistic  Tn  has  been  referred  to  in  the  literature  as  a  zero-memory  nonlinearity  (ZNL). 
Note  that  in  the  case  where  the  process  is  iid  under  either  hypothesis,  the  log-LRT 
(1.2)  has  this  form  with  il?(x)  =  log[/i(z)//o(i)]  and  A  =  [logq,oo).  Conversely,  if  the 
process  is  not  iid  under  at  least  one  of  the  hypotheses,  then  the  LRT  will  involve  memory, 
and  consequently  a  memoryless  decision  rule  will  be  suboptimal.  Nevertheless,  there  are 
some  advantages  to  using  a  memoryless  decision  rule,  particularly  when  simplicity  of 
implementation  is  important.  In  this  thesis  we  consider  in  detail  the  use  of  various 
memoryless  decision  rules  of  the  form  (1.3)  as  applied  to  the  discrimination  problem 
(1.1)  with  the  assmuption  that  the  process  is  stationary  under  either  hypothesis. 

In  order  to  implement  a  test  of  the  form  (1.3),  one  must  specify  the  nonlinearity 
4>  and  the  critical  region  A.  Most  often,  A  will  be  an  interval,  such  as  the  interval 
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(7,00)  which  involves  a  single  threshold  7.  In  this  case,  one  would  probably  proceed  by 
specifying  ip  first  and  then  choosing  the  value  of  the  threshold  7  through  simulation  or 
actual  testing  to  adjust  the  error  probabilities  to  their  desired  values.  Before  determining 
the  nonlinearity  rj>,  however,  one  must  decide  on  a  performance  criterion.  Then  ip  will 
be  chosen  to  be  optimal  with  respect  to  this  criterion.  Although  the  most  natural 
and  useful  criterion  is  that  of  the  error  probabilities,  for  a  test  of  the  form  (1.3)  one 
cannot  in  most  cases  obtain  a  closed  form  expression  for  the  error  probabilities,  and  thus 
another  performance  measure  may  be  more  useful.  Such  is  the  case  for  the  results  of  this 
thesis,  where  we  consider  performance  measures  which  involve  the  mean  and  asymptotic 
variance  of  the  test  statistic  Tn  under  the  two  hypotheses.  For  stationary  processes,  the 
mean  of  Tn  under  Hi  is  given  by 

n 

EjT„(X)  =  E ,  £  =  nE,  iP(X1)  (1.5) 

i»i 

and  the  variance  under  Hi  is 

Vari  rn(X)  =  E.  Tn(X)2  -  [Ei  T(X)]2 

n  n 

=  Ei  E  E  -  "2  [E,  tf(*i)l2 

i»l  kxl 

=  £Var^(X;)  +  2£  £  Cov, :[*(*;),  ^ 

;'=i  i=i  ksj+i 

n— 1  n 

=  nVar,-0(X1)  +  2^  £  CoVi  [tP(X1),iP(Xk^+l)] . 

j=l  k=j+l 

In  our  notation,  E<,  Vari,  and  Cov*  denote,  respectively,  the  expecation,  variance,  and 
covariance  operations  under  hypothesis  H j.  We  now  define  two  functionals  m(ip)  and 
<7} (ip)  which  will  appear  in  the  work  which  follows.  Define 

fii(ip)  =  E,  ip(Xi)  (1.7) 
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and  in  cases  where  the  sum  exists,  define 


CO 

=  Var,  0(X.)  +  2  Cov,  [0(Xi),  0(Xi+i)]  -  (1.8) 

;= 1 

Thus  we  have  E<T„  =  and  we  note  also  that  if  the  sum  in  (1.8)  converges,  then 
—  |Var<  Tn(X)  —  no] (0)j  — *  0  as  n  — ►  00  so  that  Var<  Tn(X)  ss  nof  (0)  for  large  values  of 
n.  Thus  ncf  is  the  asymptotic  variance  of  Tn  unGer  hypothesis  Hi.  These  functionals,  or 
“moments,”  of  the  nonlinearity  0  have  rather  nice  expressions  in  terms  of  the  marginal 
and  joint  densities  of  the  processes,  and  by  considering  performance  measures  involving 
these  moments  as  opposed  to  the  error  probabilities,  we  shall  find  that  the  analysis 
becomes  much  more  manageable.  Note  also  that  for  such  performance  measures,  the 
n-dimensional  densities  are  not  involved  for  n  >  2. 

In  the  chapters  that  follow,  the  performance  measures  which  are  derived  are  based 
on  central  limit  theory;  therefore  it  will  be  necessary  to  restrict  the  class  of  processes 
which  will  be  considered.  In  particular,  we  desire  that  the  processes  involved  demonstrate 
some  kind  of  asymptotic  independence  so  that  central  limit  theory  may  be  applied.  The 
type  of  asymptotic  independence  which  is  appropriate  for  the  work  here  is  that  which 
is  defined  by  various  mixing  conditions.  Let  T\  denote  the  <7- field  of  events  generated 
by  {X,,a  <  i  <  b}.  Then  the  process  {Xi}  is  said  to  be  strong  mixing  if  there  exists  a 
sequence  {an}  such  that  an  — ►  0  and 

\P(AnB)-  P(A)P(B)\<an  (1.9) 

for  any  events  A  G  7^,  B  G  If  it  is  also  true  that 


\P(AnB)-P(A)P(B)\  <  4>nP{B ) 


(1.10) 


for  some  sequence  { 6n }  with  <pn  — *•  0,  then  the  process  is  called  0-mixing.  The  (^-mixing 
condition  clearly  implies  the  strong  mixing  condition.  Finally,  define  the  process  {X,}  to 
be  m-dependent  if  for  every  integer  k  we  have  that  and  are  independent. 

Note  that  m-dependence  is  a  special  case  of  ^mixing  with  <t>n  =  0  for  n  >  m.  We 
include  m-dependence  because  it  is  easier  to  work  with  analytically  and  because  in 
certain  situations  it  can  approximate  the  <£- mixing  condition  well  if  m  is  sufficiently  large. 
Because  mixing  conditions  are  defined  in  terms  of  the  underlying  cr-fields  of  events,  the 
conditions  are  preserved  by  memoryless  transformations,  so  that  {<7(A\)}  will  satisfy  the 
same  mixing  condition  as  {X,},  provided  g  is  measurable.  Central  limit  theorems  have 
been  proved  for  strong  mixing  and  (^-mixing  processes,  and  one  such  theorem  is  given 
here  as  Theorem  1. 

Theorem  1.  Let  {XJ  be  a  stationary  0- mixing  process  with  0XJ2  <  oo  and 

let  be  a  measurable  real-valued  function  such  that  E|0(Xi)|  <  oo  and  E0(Xi)2  <  oo. 
Then  the  series  in  (1.8)  converges  absolutely  and  [T„(X)-np(0)]/y/no-2(0)  converges  in 
distribution  to  a  standard  normal  random  variable  (having  zero  mean  and  unit  variance ), 
provided  <r2(V>)  >  0. 

For  a  proof,  see  [2]  and  [4].  In  Chapters  2  and  3,  we  shall  assume  the  m-dependent  or 
0- mixing  condition  and  make  reference  to  Theorem  1.  In  Chapter  4  we  shall  assume 
the  strong  mixing  condition  and  shall  also  state  there  a  central  limit  theorem  for  strong 
mixing  processes. 

A  performance  measure  which  has  received  a  lot  of  attention  in  the  literature 
is  that  of  the  efficacy  of  a  test,  which  is  based  on  the  concept  of  asymptotic  relative 
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efficiency  (ARE).  In  order  to  define  the  efficacy  of  a  test,  consider  the  following  problem: 


H0:  Xi  =  Ni 
Hu  Xi  =  Ni  +  0 


(1.11) 


where  the  process  {iVj}  is  a  stationary  process  which  represents  a  noise  process.  Thus 
under  Hq  we  observe  strictly  noise  while  under  H\  we  observe  a  constant  signal  9  plus 
noise.  Let  denote  the  n  dimensional  density  of  the  noise  process.  Note  that  this 
situation  is  a  special  case  of  the  problem  (1.1)  with  fi)n\x)  =  /jv^M  ^  /}n)(x)  = 
/m\x  -  9).  If  we  consider  a  test  involving  the  test  statistic  T*x) ,  then  for  a  given  signal 
strength  9  and  fixed  error  probabilities  a,  (3  under  Hq  and  Hi,  respectively,  a  minimum 
sample  size  ni  is  required.  Under  similar  conditions  but  with  a  different  test  statistic 
Tn^  a  sample  size  n 2  is  required.  The  ARE  of  with  respect  to  may  now  be 
defined  as  the  limit  of  the  ratio  ni/n 2  as  9  —*  0  and  a  and  (3  remain  fixed.  The  efficacy 
of  T(1)  is  defined  to  be 


1\ 


lim 

n-^oo 


nVaroTW 


(1.12) 


if  the  limit  exists.  Here  the  subscripts  for  the  expectation  and  variance  operators  denote 
the  value  of  9  under  which  the  operations  are  performed;  e.g.  Varo  denotes  the  variance 
under  Hq.  The  importance  of  the  efficacy  stems  from  the  Pitman- Noether  theorem, 
which  states  that  if  certain  conditions  are  satisfied  individually  by  and  T(2\  then 
the  ARE  ofT<2>  with  respect  to  T ^  is  equal  to  the  ratio  772/771  of  the  two  efficacies.  Hence 
TW  may  be  considered  a  better  test  than  T(2)  under  the  ARE  criterion  if  771  >  tj2-  The 
conditions  on  T ^  which  are  necessary  for  the  Pitman- Noether  theorem  are  as  follows: 


(>)  ^E»r<I’Lo  > » 

(ii)  77!  >  0 


[4rE#r(1)h=  a  Var*  TW  _ 

(iii)  lim  - T~s‘ —  77 — “zrrr  =  1  where  9n  =  K/y/nfor  some  constant 

n-oo  [±EtTW}2^0  —  Var0r<» 

K. 


(iv)  [l!1*  -E*!^]  /\/var«ri1)  converges  in  distribution  to  a  standard  normal  random 
variable  as  n  — *  oo  for  all  9  6  (0,0). 


Similar  conditions  are  required  of  T^2K  The  condition  (iv)  requires  that  the  test  statistic 
X W  be  asymptotically  normal.  In  cases  where  the  noise  process  is  iid,  it  is  straight¬ 
forward  to  apply  a  central  limit  theorem  to  show  that  the  condition  (iv)  holds  for  the 
test  statistic  in  (1.4),  and  Miller  and  Thomas  [11]  have  derived  the  nonlinearity  ip  which 
maximizes  the  efficacy  in  such  a  situation.  They  also  generalize  to  the  case  of  a  noncon¬ 
stant  signal  involving  the  test  statistic  Tn(x)  =  ipi(xi)  for  which  the  nonlinearity 
varies  with  time.  In  Poor  and  Thomas  [1],  the  optimal  nonlinearity  is  derived,  still  with 
respect  to  the  efficacy  performance  measure,  for  the  case  where  the  noise  process  is  in¬ 
dependent.  Halverson  and  Wise  [2]  show  how  to  correctly  extend  this  result  to  the  more 
general  case  of  ^-mixing  noise.  In  either  of  these  latter  two  cases.  Theorem  1  is  required 
in  order  to  demonstrate  that  the  test  statistic  is  asymptotically  normal. 

In  this  thesis  we  shall  not  confine  ourselves  to  the  weak  signal  in  noise  problem,  but 
we  shall  consider  the  more  general  problem  of  discrimination  (1.1),  with  the  assumption 
that  the  observed  process  is  stationary  and  satisfies  a  mixing  condition  under  either 
hypothesis.  For  this  type  of  problem  the  efficacy  performance  measure  is  no  longer 
appropriate.  We  shall  therefore  derive  the  appropriate  performance  measures.  These 
performance  measures,  which  are  derived  under  different  problem  formulations,  are  all 
of  the  form 


(fit  -  no)2 

IK*.  *i)ir 


(1.13) 
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where  the  denominator  is  the  square  of  some  “norm”  of  the  vector  (<7o,«7i),  and  where  /i, 
and  <j{  are  defined  by  (1.7)  and  (1.8).  Thus  these  performance  measures  are  functionals 
of  the  nonlinearities.  In  Chapter  2  we  consider  the  problem  (1.1)  under  a  Neyman- 
Pearson  formulation.  Thus  if  Pi  denotes  the  probability  of  error  when  Ht  is  true,  then 
the  formulation  considered  here  is  to  minimize  Pi  subject  to  the  constraint  that  P0  <  a. 
For  this  situation,  it  will  be  shown  that  the  optimal  receiver  for  large  sample  sizes  (that 
is,  in  an  asymptotic  sense)  is  such  that  it  maximizes  the  performance  measure 


(A*i  ~  Mo)2 

/r? 


(1.14) 


For  the  reverse  situation  where  Pq  is  minimized  subject  to  P\  <  a,  the  performance 
measure  is 


In  each  of  these  two  performance  measures,  the  “norm”  in  the  denominator  is  ||(<To,  o\  )||  = 
<Tj,  which,  technically  speaking,  is  a  pseudo-norm,  since  ||(ob,°'i)||  =  0  does  not  necessar¬ 
ily  imply  that  (<to»<^i)  =  (0,0)-  Observe  the  similarity  of  the  performance  measure  Si  to 
the  efficacy  measure  (1.12).  This  similarity  arises  from  the  fact  that  both  performance 
measures  are  aymptotic  performance  measures  based  on  central  limit  theory.  For  the 
efficacy,  however,  the  assumptions  (i)-(iv)  are  necessary,  whereas  fewer  assumptions  are 
necessary  to  justify  the  use  of  S\.  Also  in  Chapter  2,  the  nonlinearity  which  maximizes 
the  performance  measure  c.  is  shown  to  satisfy  a  Fredholm  integral  equation  of  the  sec¬ 
ond  kind.  It  is  also  shown  how  the  integral  equation  can  be  solved  using  Hilbert-Schmidt 
theory.  In  Chapter  3,  we  consider  again  the  problem  (1.1),  this  time  under  a  minimax 
formulation;  that  is,  we  desire  to  minimize  the  maximum  of  Pq  and  P\ .  It  will  be  shown 
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that  the  optimum  test  statistic  is  oue  which  maximizes  the  performance  measure 


(t* i  -  /* o)2 
(<To  +<Tl)2' 


(1.16) 


The  nonlinearity  which  is  optimal  for  this  performance  measure  is  shown  to  satisfy  a 
nonlinear  integral  equation  for  which  a  closed  form  solution  cannot  be  given;  however, 
it  is  shown  how  the  solution  can  be  obtained  numerically  using  an  iterative  procedure. 
By  modifying  the  minimax  formulation  slightly,  it  is  shown  that  one  can  also  derive  the 


performance  measure 


S3  = 


Q*t  ~  Mo)2 
*o  +  ’ 


(1.17) 


which  has  the  Euclidean  norm  in  the  denominator.  It  will  be  shown  that  the  maximiza¬ 


tion  of  S3  leads  to  a  linear  integral  equation.  In  Chapter  4,  the  issue  of  robustness  is 
addressed.  The  approach  is  that  of  game  theory,  or  minimax  theory,  where  one  tries  to 
design  the  optimal  receiver  to  match  the  worst  case  densities  chosen  out  of  uncertainty 
classes.  Results  are  given  here  for  the  performance  measures  Si  (and  consequently  S0 
as  well)  and  S3.  In  Chapter  5,  the  theory  is  applied  to  the  problem  of  discrimination 
between  a  Rayleigh  density  and  a  lognormal  density,  where  strong  correlation  is  present. 
The  nonlinearity  which  maximizes  each  of  the  performance  measures  is  computed  nu¬ 
merically,  and  the  performance  results  from  computer  simulations  are  presented.  The 
simulation  results  are  compared  to  the  results  for  the  receiver  which  is  designed  under 
the  assumption  that  the  processes  are  iid.  Chapter  5  also  contains  a  discussion  of  the 
results  of  the  thesis. 
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CHAPTER  2 


THE  NEYMAN-PEARSON  FORMULATION 

2.1  The  performance  measure  Si 

In  this  chapter,  we  consider  in  detail  the  hypothesis  testing  problem  (1.1)  under 
a  Neyman-Pearson  formulation.  Our  informal  statement  of  the  problem  is  the  following: 

minimize  P\  (2.1) 

subject  to  Pq  <  a 

where  Pi  denotes  the  probability  of  error  when  £T<  is  true.  The  reason  that  the  statement 
of  the  problem  (2.1)  is  informal  is  because  it  depends  implicitly  on  the  sample  size  n, 
and  although  we  are  interested  in  tests  with  a  fixed  sample  size,  we  do  not  wish  to 
specify  n  before  we  consider  the  problem  (2.1).  When  we  speak  of  a  test  or  decision  rule, 
we  shall  actually  mean  a  family  of  decision  rules — one  for  each  n — and  in  comparing 
different  tests,  we  shall  not  explicitly  mention  a  particular  valu:  of  n.  Since  for  any 
reasonable  test  Pi  — ►  0  as  n  — ♦  oo,  we  may  state  our  problem  more  accurately  in  this 
way:  considering  all  level  a  tests  (i.e.  Pq  <  a  for  all  n),  find  the  test  for  which  the  rate 
of  convergence  of  Pi  to  0  is  fastest.  We  can  see  now  that  if  for  some  test  d ^  the  rate  of 
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convergence  is  faster  than  the  rate  for  another  test  dS2\  then  there  is  an  integer  N  such 
that  is  better  than  in  the  sense  implied  by  (2.1)  whenever  the  sample  size  n  is 
greater  than  N.  In  this  section  we  shall  derive  a  performance  measure  Si  which  specifies 
(approximately)  the  rate  at  which  P\  converges  to  0,  and  the  connection  between  this 
performance  measure  and  the  Neyman- Pearson  problem  (2.1)  should  be  clear. 

We  will  restrict  our  attention  to  only  those  decision  rules  of  the  form  (1.3)  with  the 
assumption  that  the  test  statistic  Tn  is  asymptotically  normal  under  either  hypothesis. 
Thus  under  hypothesis  Hx  we  assume  that  there  exist  constants  m  and  erf  >  0  such 
that  (Tn  -  nfii)/ \Jna\  converges  in  distribution  to  a  standard  normal  random  variable 
as  n  -*  oo.  In  the  case  that  the  test  statistic  has  the  form  (1.4)  and  the  conditions  of 
Theorem  1  are  satisfied,  the  constants  Hi  and  a 2  are  given  by  (1.7)  and  (1.8),  respectively. 
With  this  assumption,  then,  for  large  values  of  n  the  distribution  of  the  test  statistic 
is  approximately  normal  with  mean  nm  and  variance  no}  when  H,  is  true.  Taking 
a  heuristic  approach,  one  can  use  this  knowledge  to  choose  the  critical  region  A  by 
considering  the  decision  rule  (1.3)  to  be  equivalent  to  an  LRT  between  two  Gaussian 
densities.  In  our  case,  the  two  Gaussian  densities  are 


<p0(t)  =  -J~=r-  exp 
V2irnoo 

•Pi  (*)  =-7==— exp 
y/2vno\ 


(  ( t  -  nua)2 1 
1  2tN7q  J 

f  (t-nfix)2} 

l  2  no}  J 


(2.2) 


and  the  log- LRT  is  given  by 


<f(x)  =  1  iff 


log 


Vi  (Tn)' 

.VoiTn). 


>  nq, 


or  equivalently, 
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d(x)  =  1  iff 


1 

n 


r„  + 


(2.3) 


where  7  =  2rj— (2/n)log(  <tq/<Ti).  We  can  assume  without  loss  of  generality  that  no  <  Hi'- 
the  case  of  hi  =  ho  is  unlikely  to  occur  in  practice  and  will  not  be  considered,  and  the 
case  of  fi\  <  hq  follows  the  same  procedure  with  the  appropriate  sign  change.  Assume 
first  that  (Tq  —  aj.  If  this  is  true,  the  expression  on  the  left  side  of  (2.3)  is  a  linear  function 
of  Tn,  and  it  is  easy  to  see  that  the  log-LRT  has  the  form  (1.3)  with  A  =  [717',  00),  where 
717'  is  the  root  of  the  Linear  function  in  (2.3).  The  error  probabilities  are  then  given 
approximately  by 

P,  =  * 

L  *o 

(2-4) 

*  -  * 

where 

$(2)  =  -L*  r  e-*2/2 dt. 
v  V5F/-CO 

Now  in  order  to  have  Pq  =  a,  we  must  take  7'  =  _1(a)  +  Ho,  and  substituting 

for  Y  in  the  expression  for  P\  we  obtain 


Pi  =  $ 


(2.5) 


Now  $(2)  is  an  increasing  function  of  2  and  so  to  minimize  Pi,  it  is  necessary  to  make 
the  argument  of  $  in  (2.5)  as  small  as  possible;  that  is,  to  make  the  argument  large  in 
magnitude  and  negative.  The  term  -y/n(n\  -  Ho  )/&i  is  negative,  since  we  are  assuming 
that  hq  <  n\,  and  it  increases  in  magnitude  as  n  increases,  so  that  Pi  — ►  0.  The  quantity 
(a*i  -  Mo)/*7!  determines  the  rate  at  which  Pi  goes  to  zero,  and  we  can  see  that  the  best 
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asymptotic  performance  results  when  this  quantity  is  maximized.  Since  it  is  positive,  it 
is  also  clear  that  maximizing  (n\  —  /io)/<7i  is  equivalent  to  maximizing  the  quantity 

(m  1  -  Mo)2 


S 1  = 


(2.6) 


Assuming  now  that  >  a*,  we  will  obtain  the  same  performance  measure;  the 
case  of  <  <7^  will  not  be  considered  since  the  analysis  parallels  the  case  of  >  <7j 
and  the  same  result  is  obtained.  We  see  that  the  test  (2.3)  is  identical  to  the  test  (1.3)  if 
A  =  [7171,7172],  where  7171  and  7*73  are  the  roots  of  the  quadratic  in  (2.3).  The  quantities 
7i »  72  are  given  by  the  expressions 

_  Mi<7q  ~  Mo^i  ~  Wi  \/(Mi  ~  Mo)2  ~  7(<7p  ~  0? ) 


7i 


*0  ~  ffi 


72 


_  Mi<?o  ~  M 0^1  +  \/(mi  -  Mo)2  -  7(^0  ~  of ) 


(2.7) 


*0  -  (T? 


and  error  probabilities  are  given  approximately  by 


Po  =  $  I  i/n 


A 


-72  -  Mo 


*0 


=  *  f-v^— rifl 


-  $ 

+  * 


7i  -  Mo 


00 

7i  -  Mi 
*1 


(2.8) 


Substituting  for  71  and  72  yields 


Po  =  $ 


<yp(Mi  -Mo)  +  friv(7)' 

«o  * 


_$ 


/r*o(Mi  ~  Mo)  -  <Tiu(7)l 
Vn - - 


*0  “  *1 


J 


o  *  /t*i(Mi  -Mo)  +  <To»(7)  ,  *  /-<Ti(mi  -Mo)-^o«(7) 

"1  =  w  ~Vn - 2 - 3 -  +  ^  v  7i - - - 5 - 

00  -  °\  .  05  -  °\ 


(2.9) 


where  v(y)  =  >/(Mi  Mo)2  —  7(<^o  “  <*i  )•  Thus  we  have  the  approximate  error  probabil¬ 
ities  given  as  functions  of  a  parameter  7. 

For  situations  where  the  error  probabilities  are  relatively  small,  little  is  to  be 
gained  by  preferring  a  two  threshold  test  to  a  single  threshold  test.  This  is  the  gist  of 
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Proposition  2  below.  In  order  to  make  the  proposition  precise,  it  is  necessary  to  assume 
that  Tn  is  truly  Gaussian  and  not  merely  an  approximation. 


Proposition  2.  Let  Tn  be  a  test  statistic  which  has  the  distribution  .\f(nu,,ncr?) 
under  hypothesis  Hi  l or  x  =  0,1,  with  (Tq  >  o\.  Let  Pg2)  and  P{2)  denote  the  error 
probabilities  for  the  decision  rule  S2^  of  the  form  (1.3)  with  A  =  [7171,  where  71  and 
72  are  given  by  (2.7).  Let  and  P(:>  denote  the  error  probabilities  for  the  decision 
rule  (fl1*,  also  of  the  form  (1.3),  but  with  A  =  [7171,00).  Assume  that  the  thresholds  are 
chosen  for  each  sample  size  n  so  that  Pg2*  =  a.  Then  as  n  -*  00,  we  have 


P<2)  -  Pj1' 


pj2) 


0. 


Proof.  Define 


v-i 


Q\(P\  -po)  +  <ypp(7) 
o\-o\ 

-  Mo)  -<r0i>(7) 


1  _  <7q(Mi  -  Ho) + 

/'l  ~  ■>  ■> 

<75  —  <J\ 

1  _  o-qCmi  -  mo)  -  £i£(7) 
—  _2  _2 


Then  from  (2.9)  we  have 


if*  «#(-Vwi)  +  *(->/m*) 

px(1)  =  *(-vw 


and 


P<2)  =$(^  A,)-$(v^A2). 


(2.10) 


Since  Ai  >  <70(mi  ~  Ato)/(°ro  “  <*?)  >  the  first  term  of  (2.10)  converges  to  1  as  n  — *•  00. 
But  Pg2*  =  a,  so  the  second  term  converges  to  1  -  a.  Therefore  A2  — » *  0.  This  implies 
that  17(7)  — *  (oe/<T\)(pi  -  pn)  and  thus 

Hi  -  Ho 


(^  +  <r?)(^i -^0) 

- r— 2 - r; -  >  0, 


i/2 


^1 


>  0. 
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The  proposition  will  follow  if  we  can  show  that 


0. 


To  do  this,  use  the  inequalities  [8,p.39] 


x)  <  - 


a/2jtx 


which  are  valid  for  x  <  0.  Thus  we  have 


t/i  /  nv}  \  r  n,  , 

<  *  (=rMj  exp  l-2(,,;  -  ■*>]  -  °- 


Because  the  test  statistics  which  we  consider  are  only  approximately  normal,  this 
proposition  does  not  directly  apply,  and  it  has  been  presented  to  provide  a  heuristic 
argument  for  a  single  threshold  test.  In  fact,  the  proof  of  the  proposition  depends  in  a 
crucial  way  on  the  tail  behavior  of  the  distributions,  and  for  a  series  which  converges 
under  a  central  limit  theorem,  convergence  is  usually  slowest  in  the  tail  region.  In  most 
practical  situations,  however,  the  realizations  of  the  test  statistic  under  Hi  tend  to  pile 
up  around  n/ij,  so  th^t  if  n{n\  -  ft o )  is  large  then  a  two  threshold  test  offers  no  advantage 
over  a  single  threshold  test.  Thus  we  may  justify  a  single  threshold  test  not  only  on  a 
heuristic  basis  but  also  from  practical  considerations. 

We  are  now  ready  to  derive  the  performance  measure  Si  for  the  case  where  <75  > 
oj  and  where  a  single  threshold  test  of  the  form  (1.3)  is  used.  With  A  =  [07,00),  we 
have  the  error  probability  under  Ho  given  approximately  by 

Po  =  $  . 

L  ffo 
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If  we  set  Pq  =  a  then  we  find  the  value  of  7  to  be 

7  =  -■^=$~1(a)  + 
y/n 

On  the  other  hand,  the  error  probability  under  Hi  is  given  approximately  by 


Pi  =  $ 


V* 


7  -  Pi 


<?\ 


—  $  $-1(a)  -  y/n 

'  <Ti  <7l 


Ml  -  Mo 


(2.11) 


where  the  last  equality  in  (2.11)  is  obtained  by  making  the  substitution  for  7.  Now  the 
quantity  <r<j / does  not  depend  on  the  sample  size  n.  Therefore  the  quantity  (px  -fx0  )/<r\ 
again  determines  the  rate  at  which  Pi  converges  to  zero.  By  the  same  reasoning  as  before, 
then,  we  see  that  the  best  asymptotic  performance  results  when  Si,  as  given  by  (2.6),  is 
maximized. 

The  results  obtained  in  this  section  do  not  depend  on  the  form  of  the  test  statistic 
Tn,  only  on  the  assumption  that  there  exist  the  constants  no,  ,  cr0,  and  ax  such  that 
(Tn  -  nfii)f  y/ncr}  converges  in  distribution  to  a  standard  normal  random  variable  when 
Hi  is  true.  In  the  remainder  of  this  chapter,  we  restrict  our  attention  to  test  statistics  of 
the  form  Tn  =  ff(x«)  w^ere  the  conditions  of  Theorem  1  hold.  Thus  the  “moments” 
Mo»  Mt»  <*01  and  <*i  are  given  by  (1.7)  and  (1.8)  and  the  performance  measure  Si  becomes 
a  functional  of  g. 

2.2  The  Optimal  Nonlinearity 

In  the  first  section  of  this  chapter  we  showed  heuristically  that  the  best  test 
statistic  in  the  asymptotic  sense  for  the  Neyman- Pearson  problem  (2.1)  is  that  for  which 
the  performance  measure  Si  is  maximized.  In  this  section,  we  will  consider  the  following 
optimization  problem 


„  ,  ,  [mi(s)-Mo(<7)]' 

maximize  S\{g)  =  - - - — 

<*1(9) 


(2.12) 
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subject  to  the  constraints  that  E*  g2{Xi)  <  oo  for  i  =  0, 1.  Thus  if  g\  solves  (2.12),  then 
the  test  statistic  Tn  =  ^gi(xi)  is  optimal  in  the  sense  of  Section  2.1  over  the  class  of  all 
memoryless  test  statistics  (1.4).  In  the  next  section  conditions  are  given  which  guarantee 
that  the  nonlinearity  gi  derived  in  this  section  satisfies  the  constraint  Ej  gj(X i)  <  oo. 
In  order  to  have  Eofff(A'i)  <  oo,  then,  it  is  sufficient  to  require  that  fo(x)/fi(x)  be 
bounded  for  all  x,  and  we  shall  make  this  assumption.  We  shall  also  take  this  condition 
to  mean  that  f\(x)  =  0  =>  /o(x)  =  0  as  well.  We  assume  that  g  and  all  the  densities 
involved  are  continuous  so  that  we  can  apply  the  classical  techniques  from  the  calculus 
of  variations.  Naturally  we  will  have  to  assume  that  the  conditions  of  Theorem  1  are 
satisfied  under  each  hypothesis,  so  that  the  test  statistic  T„  satisfies  our  assumption 
of  asymptotic  normality.  In  this  section,  however,  we  shall  require  the  more  stringent 
condition  that  the  observed  process  be  m-dependent  under  either  hypothesis.  Observe 
that  for  an  m-dependent  process  the  expression  for  of  (g)  as  given  by  (1.8)  becomes 

m 

*Ug)  =  Ei  92(Xi)  +  £  Eig(Xl)g(X^l)  -  (2m+  1)[E,5(X,)]2 .  (2.13) 

At  the  end  of  this  chapter,  we  shall  discuss  the  extension  of  the  m-dependent  results  to 
the  more  general  case  of  (^-mixing  processes,  and  show  that  for  all  practical  purposes, 
(^-mixing  processes  can  be  approximated  by  m-dependent  ones. 

In  the  remainder  of  this  section,  we  derive  an  integral  equation  for  which  the 
optimal  nonlinearity  g\  is  a  solution.  We  begin  by  observing  that  the  value  of  5i  is 
unchanged  when  g  is  multiplied  by  a  constant,  hence  we  can  maximize  (ni  -  no)2  with 
of  held  constant.  But  under  our  assumption  that  ni  -  no  >  0,  maximizing  (ni  -  no)2  is 
equivalent  to  maximizing  ni  ~  t*o-  We  will  therefore  introduce  a  Lagrange  multiplier  A 
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and  consider  the  problem 


maximize  Mi  -  Mo  -  Aa2.  (2.14) 

Now  define  J(g)  =  mi  — Mo-Ac2.  A  necessary  condition  for  g  to  maximize  J(-)  is  that  the 
Gateaux  variation  of  J  evaluated  at  g  vanish.  Thus  we  must  have  ^J(g  +  c6g) |  0  =  0, 
where  6g  is  an  arbitrary  continuous  function  satisfying  Ej£j2(Xi)  <  oo  for  i  =  0,1. 
Since 


J(g  +  e6g)  =  J(g)  +  e 


Ei  Sg(X i)  -  E0  Sg(X ,)  -  2a{Ej  g(X1)6g(X ,) 


+  j;[Ei  <kxi)*5(*;+i)  +  Ei  5(xi+1)M*i)] 


(2.15) 


i=i 


-  (2m  +  1)EX  g(Xx) Ex  M*i)}  +  e2  [-Aaftty)] , 


which  is  a  quadratic  function  in  e,  the  Gateaux  variation  is  given  by  the  coefficient  of  e. 
Denote  by  /<  and  f{  the  densities  of  Xi  and  (Xx,Xj+i),  respectively,  under  If  we 
introduce  these  densities  into  the  above  expression  and  set  the  coefficient  of  e  equal  to 
0,  we  obtain 


0  =  J 6 g(x)^fi(x)-  fo(x)-2\[fx(x)g(x) 

+/©*■  ,y)  +  /i(y,*)]  -  (2m  +  i)/i(®)/i(y))g(y)^y]]dl- 


(2.16) 


The  right-hand  side  of  (2.16)  will  be  zero  for  arbitrary  Sg  iff  the  quantity  in  the  braces 
is  identically  zero.  Setting  this  expression  equal  to  zero,  we  easily  derive  the  following 
integral  equation: 

2A g(x)  =  +  2A  j  K\(x,y)g(y)dy  (2.17) 
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with  the  kernel  K\  given  by 


1 

IU(x,y)  =  (2m  +  l)/i(jr)  -  £[//(*,  y)  +  f({y,x)].  (2.18) 

In  the  next  section  we  will  discuss  the  conditions  which  guarantee  the  existence  of  a 
solution  to  this  integral  equation.  For  now,  we  note  that  in  order  to  divide  by  f\(x)  as 
we  have  done,  we  require  the  absolute  continuity  condition  fi(x)  =  0  =>  fo(x)  =  0. 

We  have  derived  the  integral  equation  (2.17)  as  a  necessary  condition  for  g  to 
solve  the  maximization  problem  (2.14).  However,  if  g  solves  (2.17)  then  J(g  +  e6g)  = 
J(g)  —  c2A o\  so  that  J(g)  >  J(g  +  e 6g )  provided  A  >  0.  Thus  if  A  >  0,  the  condition 
that  g  solve  the  integral  equation  (2.17)  is  also  sufficient  for  g  to  solve  the  maximization 
problem.  Observing  the  form  of  (2.17)  it  is  obvious  that  A  determines  the  scaling  of  g, 
thus  we  may  take  an  arbitrary  (positive)  value  for  A.  In  the  analysis  that  follows,  it  will 
be  convenient  to  take  A  »  and  with  this  value  the  integral  equation  (2.17)  becomes 

s(x)  ,  MflzMfl  +  j  Kx(x,y)g(y)dy.  (2.19) 

If  we  make  the  substitution  g(x )  =  h(x)j  yj fi(x),  the  integral  equation  (2.19)  becomes 

h(x)  =  +  [  A\*(x,  y)h{y)dy  (2.20) 

VMX )  J 

which  has  the  symmetric  kernel 

m 

Ki(x,y)  =  (2m  +  l)[A(*)/i(y)] *  -  [A(*)/i(y)]"J  J][/f(*,»)  +  f{(y,x)}.  (2.21) 

j=i 

Our  purpose  for  making  the  substitution  for  g  to  get  the  equation  (2.20)  is  to  permit  us 
to  apply  the  Hilbert-Schmidt  theory  for  symmetric  integral  equations.  This  is  the  topic 
of  the  next  section. 
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Consider  now  crl{g\),  which  we  may  write  in  the  form 


'i(Si)  =  J 9i(x)fi(x)dx  +  53  JJ 9i(x)gi(y)[fi(x,y)  +  f((y,x)]dxdy 


-  (2m  +  1)  J 1 g\{x)gx(y)Jx(x)fx(y)dxdy 

=  f  9i{z)fi(x)[gi(x)  +  y|^j5Z[/f(*,y)  +  /f(y,*)] 

-  (2m  +  l)/i(y)|<7i(y)dyjdx 

=  J 9i(x)fx(x)[gi(x)  -  J  Kx(x,y)gx(y)dy]dx 


From  this  expression  it  can  be  seen  that  if  gx  solves  the  integral  equation  (2.19),  then 
cri(flri)  -  Mi(9i)  ~  A*o(<7i)i  and  thus  Si(^i)  =  Pi(gx)  -  po(gx)  is  the  optimal  value  of  Sx. 
We  summarize  the  results  of  this  section  in  the  following  theorem. 


Theorem  3.  If  the  process  {X,}  is  m-depeadeat  under  both  Ho  and  Hx,  then  a 
sufficient  condition  for  gx  to  maximize  Sx  is  that  gx  solve  the  integral  equation  (2.19). 
Furthermore,  if  gx  solves  (2.19)  then  Sx(gx )  =  px(gx)  -  Po(gi). 


2.3  The  Solution  of  the  Integral  Equation 


To  apply  the  theory  of  Fredholm  equations  we  require  the  following  two  conditions: 


I/i(»)  ~  /o(*)Ia 


(0)  I  AM 

(b)  JJ  \Kx(x,y)\2dxdy 


dx  <  oo 


<  00. 


Under  the  assumption  that  fo(x)/  fx(x)  is  bounded,  the  condition  (a)  follows  easily.  To 
show  condition  (b),  we  assume  that  the  densities  f((x,y),j  =  1  ,...,m  have  the  diagonal 
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expansion  [5] 


OO 

=  /l(*)A(y)^an)^n(*)^n(v).  (2.22) 

n=l 

where  the  functions  {0n}  are  orthonormai  in  the  sense  that  /  9m(x)$n(x)/i(x)dx  =  Smn- 
Some  examples  of  densities  which  are  known  to  have  such  an  expansion  are  the  Gaussian 
and  gamma  densities.  Consider  now  the  terms  in  the  expansion  of  |A‘*|3.  We  examine 
only  the  terms  of  the  form  /i(*,y)/*(*,y)/(/i(x)/i(y)],  the  other  terms  being  more 
obviously  integrable.  If  we  introduce  the  expansion  (2.22)  and  apply  the  orthogonality 
relation,  we  have 

OO 

dxdy  =  <  oo 

ASO 

from  which  the  condition  (b)  follows.  Conditions  (a)  and  (b)  are  sufficient  to  guarantee 
that  the  solution  hi(x)  =  y/fi(x)gi(x),  if  it  exists,  is  square  integrable,  and  this  in  turn, 
implies  that  Eiy3(^fi)  <  oo. 

In  the  iid  case,  the  kernel  Kf  reduces  to  [/i(*)/i(y)]^,  and  it  is  easy  to  verify  the 
solution 

h(x)  = 

where  c  is  an  arbitrary  constant.  Note  that  the  absolute  term  of  the  integral  equation  is 
of  this  form  when  c  =  1.  We  may  therefore  define  had  by 

**(»  =  (2.23) 

VMX ) 

and  write  the  integral  equation  as 

h(x)  =  hVld(x)+  I  K;(x,y)h(y)dy.  (2.24) 


cfi(x)  -  /o(») 

v'AM 


// 


y)ft(*,v) 

x)My) 
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When  the  process  is  not  iid,  we  can  still  solve  the  integral  equation  using  the 
Hilbert-Schmidt  theory,  provided  we  can  find  the  eigenvalues  and  eigenvectors  of  the 
kernel.  According  to  the  theory,  a  unique  solution  exists  provided  the  conditions  (a)  and 
(b)  above  hold,  and  provided  +1  is  not  an  eigenvalue  of  the  kernel.  If  1  is  an  eigenvalue, 
a  (non-unique)  solution  still  exists,  provided  the  absolute  term  /tud  is  orthogonal  to  every 
eigenvector  corresponding  to  the  eigenvalue  1.  We  shall  see  that  1  is  an  eigenvalue  of 
the  kernel  A'J ,  but  that  in  most  cases  a  solution  still  exists. 

If  we  assume  that  the  densities  //,  j  =  l,...,m  have  the  expansion  (2.22)  and 
we  introduce  this  expansion  into  the  kernel,  we  have 


k;(x,  y)  =  VM*)Mv) 


oo 

(2m  +  l)-2£ 

nsO 


(£>(«>)V’'(l>#n(i') 

\7=1  ' 


(2.25) 


Usually  we  will  have  6q{x)  =  1  for  such  an  expansion,  and  in  such  cases  K‘  will  have 
eigenvalues  {An}  and  eigenvectors  given  by 


Aq  =  1 

An  =  -2  ^  U(nj)  (n  >  1) 
j=i 

<pn  =  y/fiOn  ( n  >  0). 

Since  Ao  =  1,  we  must  verify  that  hy, d  is  orthogonal  to  fo  =  VTi,  which  is  trivial: 

J  hiid(x)<i>o(x)dx  =  J  fi(x)  -  fo(x)dx  =  0. 

If  An  jt  1,  n  >  1,  we  have  the  solution 

h(x)  =  hiid(x)  +  T]  '-  -nCf  4>n(x)  +  cd> o(x) 

=  h-ud(x)  +  ,-n^l-0n(x)VMx)  +  cVfl(x) 

rr,  1  ~  A" 
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with 

c»  *  /*,(.*(.)*-  J  [fx(x)  -  fo(x)]On(x)dx  (2.26) 

and  c  an  arbitrary  constant.  Therefore,  the  nonlinearity  g  (with  c  =  -1)  is  given  by 


S(*) 


Mil 

A(*) 


An^n 

1- A, 


M*). 


(2.27) 


2.4  Extension  to  ^-Mixing  Processes 


In  this  section  we  consider  again  the  optimization  problem  (2.12)  under  the  more 
general  case  where  the  processes  are  assumed  to  be  ^-mixing.  We  will  use  a  compactness 
argument  to  prove  that  the  optimization  problem  has  a  solution  gi  and  then  show  that 
if  solves  the  integral  equation  (2.19)  then  the  sequence  Si(sim>)  converges  to  the 
optimal  value  Si(g\)  as  m  — *■  oo.  Obviously,  similar  results  hold  for  the  performance 
measure  So- 

First,  let  us  define  some  new  symbols.  Let  L2(f\)  denote  the  Hilbert  space  con¬ 
sisting  of  the  Borel  functions  g  such  that  / g2{x)fi(x)dx  <  oo  with  the  inner  product 
(g,h)  =  /  g(x)h(x)fi(x)dx  and  norm  \\g\\  =  [J  g2{  x)fi(x)dx]  ^ .  Let  Q  be  the  subset  of 
L2(f\)  which  contains  the  elements  r  such  that  / g(x)f\(x)dx  =  0  and  / g2(x)f\(x)dx  = 
1,  and  note  that  Q  is  compact.  Define 

17V 

C rlm(g)  =  Eig2(X1)  +  2£Eig(Xl)g(XJ+1)  -  (2m  +  1)[E,  <7(X1)]2  (2.28) 

;=i 


and 


^(m),  s  [Ml(s)  “  Ho(g)Y 
Si  i9)  ' 


(2.29) 


for  i  =  0,1.  If  the  process  is  m-dependent  under  Hi  then  <r2m  =  of.  Finally,  let 
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denote  the  solution  to  the  integral  equation  (2.19),  which  by  Theorem  3  maximizes  Sjm) 
over  the  space  T2(/i). 

Lemma  4.  The  functional  sjm'  is  continuous. 

Proof.  That  the  numerator  of  (2.29)  is  continuous  follows  from  the  Schwarz  inequality 
and  the  assumption  (a)  of  Section  2.3: 

-  [nx(h)  -  no(h)]  \  =  \{g-h,(fi  -/o)//i)| 

<ll<7-ft|HI(/i-/o)//i||. 

Thus  ,fe  must  show  that  <r2tm(-)  is  continuous.  The  first  term  on  the  righthand  side  of 
(2.28)  is  the  composition  of  the  maps  g  ||j7)|  and  i  h*  x2,  so  it  is  continuous.  The 
map  g  »-*■  f  g(x)fl(x)dx  =  (g,  1)  is  continuous  by  the  Riesz  representation  theorem. 
Thus  the  last  term  of  (2.28)  is  continuous.  Now  suppose  is  the  underlying 

probability  space  when  H\  is  true,  and  let  L2{P\)  denote  the  Hilbert  space  consisting  of 
all  random  variables  X  such  that  E\X2  <  oc  with  the  inner  product  (X,Y)  =  EiXY. 
The  random  variables  g(Xj)  for  j  =  1,2,...  are  in  L2(P ’•)  if  g  is  in  I2(/i).  Also,  if 
||  •  ||i  denotes  the  norm  for  £2(/i)  and  ||  •  ||2  denotes  the  norm  in  L2{P\),  then  ||y  - 
/t||i  =  U^(A'j)  -  h(JfJ)||2  by  stationarity.  By  the  continuity  of  the  inner  product,  then, 
|Ei0(Xi)0(Xj)  -  Eihn(A'i)hn(A';)|  -+  0  as  ||$r-  hn||i  ->•  0.  Thus  the  remaining  terms  of 
(2.28)  we  continuous.  0 

Define  the  class  £  to  consist  of  all  bivariate  joint  dens:*:es  which  have  the  diagonal 
expansion  (2.22)  with  {0n}  being  polynomials. 

Lemma  5.  If  the  densities  f( ,  j  =  1,2, .. .  are  in  C  then  the  functional  S 1  is  continu¬ 
ous. 
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Proof.  The  idea  of  the  proof  is  to  show  that  converges  to  S\  uniformly.  First  we 
show  that  <?l<m{g)  converges  uniformly  to  <rl(g)  for  g  in  Q.  Let  {0n}  be  the  orthonormal 
functions  in  the  expansion  (2.22).  Since  g  e  L2(/i),  g  has  an  expansion 


9(*)  =  2  Mn(*) 

nsl 


Then,  introducing  the  expansion  (2.22)  and  applying  the  orthogonality  relation,  we  have 
JJ  9(x)g{y)fiJ\x,y)dxdy 

t  t  °°  °°  00 

=  /  EEE  bibma^)9i(x)en(x)9m(y)9n{y)fi(x)f1{y)dxdy 

^  *  tel  r»»l  n=0 
=  £*n«n)- 


nsl 


Since  /  g2(x)fi(x)dx  =  XI &n  =  1  for  <7  6  G,  the  maximum  value  of 

|//  9{x)g(y)f[i\x,y)dxdy |  is  equal  to  the  maximum  value  of  the  sequence  flan^n  >  1}. 
It  will  be  shown  in  Chapter  4  that  this  maximum  occurs  at  either  |0j  |  or 
Since  9i  £  G  and  <r\{9{)  =  1  +  2^jai^  for  *  —  1,2,  we  have  1  +  Hljci  <  °°  where 
Cj  =  max(|Oi  |,  |o,J^|),  and  this  series  dominates  the  series  <j\ (g)  independently  of  g. 
Thus  a\ ^(g)  converges  uniformly. 

Now  if  g  is  any  nonconstant  element  of  L2(/i),  then  it  follows  that  there  exist 
constants  a,  b  such  that  ag+b  6  G •  Since  <r\  m(ag+b)  converges  uniformly,  and  since  Si”1* 
and  Si  are  invariant  under  such  transformations  of  g,  it  follows  that  Sj"1*^)  converges 
uniformly  to  Si(g).  0 

We  are  now  ready  to  prove  the  main  result. 
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Theorem  6. 


If  the  densities  f(,j  =  1,2,...  are  in  C  then  there  exists  a  solution 
9\  €  Q  to  the  optimization  problem  (2.12).  If  g[m^  solves  the  integral  equation  (2.19), 
then  the  sequence  {Si(<7im*)}  converges  to  S\(g\)  as  m  —  oo. 

Proof.  Since  S\  is  continuous  and  the  set  Q  is  compact,  there  exists  an  element  g'  of 
Q  such  that  Si  achieves  its  maximum  value  on  the  set  Q  at  g'.  If  g  is  any  nonconstant 
element  of  Z2(/i),  then  there  exist  constants  a  and  b  such  that  ag  +  6  €  Q.  But  S\{g)  = 
S\(ag  +  b)  <  Si(g').  Thus  g‘  solves  (2.12). 

Let  e  >  0.  In  the  proof  of  Lemma  5,  it  was  shown  that  S[m*  converges  uniformly 
to  Si.  Thus  there  exists  an  integer  M  such  that  for  every  m  >  M  and  every  g  € 
Z2(/i)  we  have  |Sim)(<7)  -  Si(<?)|  <  c.  Let  m  >  M  be  fixed.  If  S{m)(^”l))  <  Si(</i), 
then  we  must  have  S{m)(<7i)  <  <  Si(gx).  Otherwise,  we  have  Si(<?jm))  < 

S\(g\)  <  •S'im>(5im))-  In  either  case,  jS{m)(<7jm))  -  Si(5i)|  <  e.  This  implies  that 

|Sitfn))-Si(«h)|<2e.  0 

According  to  the  theorem,  ode  can  achieve  a  value  of  the  performance  measure 
Si  which  is  arbitrarily  dose  to  the  optimal  value  by  solving  the  integral  equation  (2.19) 
with  m  suffidently  large. 
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CHAPTER  3 


THE  MINIMAX  FORMULATION 


3.1  The  Performance  Measure  S2 

Before  one  can  determine  the  optimal  test  statistic  (or  nonlinearity)  in  some  class 
of  allowable  test  statistics,  one  must  define  an  ordering  on  the  class.  Although  the 
most  natural  ordering  to  consider  is  determined  by  the  receiver  operating  characteristic 
(ROC),  the  ROC  itself  does  not  provide  a  total  ordering,  so  that  given  two  different  test 
statistics  one  cannot  always  say  which  is  the  better  by  comparing  their  ROC’s.  Rather, 
in  order  to  have  a  total  ordering,  one  must  specify  a  particular  region,  or  operating  point, 
of  the  ROC.  Under  the  Neyman- Pearson  formulation,  this  operating  point  is  specified  to 
be  such  that  Pa  =  a.  Such  an  operating  point  may  be  undesirable  when  the  asymptotic 
performance  is  important  since  only  P\  converges  to  0  while  P0  remains  fixed.  For  this 
reason,  it  may  be  better  to  consider  the  minimax  operating  point  where  Po  =  Pi.  The 
term  “minimax"  refers  to  the  fact  that  the  decision  rule,  including  both  the  threshold 
(or  the  critical  region)  and  the  nonlinearity  g  are  chosen  to  solve  the  problem 

minimize  max  P, .  (3.1) 

<=o,i 
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Actually,  we  shall  consider  the  problem  (3.1)  in  an  asymptotic  sense,  similar  to  our 
method  for  the  Neyman-Pearson  formulation.  Thus  we  attempt  to  maximize  the  rate 
at  which  max(Po,Pi)  converges  to  zero.  Note  that  if  the  ROC  is  continuous,  then  we 
can  always  choose  the  critical  region  so  that  Po  =  Pi,  and  thus  we  actually  maximize 
the  rate  at  which  the  common  value  of  Pq  and  Pi  converges  to  0.  This  rate  is  given 
approximately  by  the  performance  measure  S2 ,  as  derived  in  this  section. 

We  proceed  by  assuming  as  in  Chapter  2  that  the  critical  region  A  is  given  by 
[1171,1172],  where  71  and  72  are  given  by  (2.7),  so  that  the  error  probabilities  axe  given 
approximately  by  (2.9).  Now  if  we  choose  the  parameter  7  =  0  so  that  v('y)  =  ni  ~  no, 
then  the  expressions  for  the  error  probabilities  reduce  to 

Po  =  *  Uazifo]  _ *  Uaziio' 

<T0  —  <Ti  J  <Tq  +  Oi 

,  (3-2) 

Pi  =  *  +  * 

(Tq  —  <T\}  (To  +  <Ti 

Now  typically  the  term  $[— >/n(ni  -  Mo)/(<^o  -  crx )]  is  orders  of  magnitude  smaller  than 
the  term  $[— y/n(ni  —  Mo)/(°'o  +  <xi)],  and  in  fact  the  ratio  of  these  two  terms  goes  to 
zero: 


Thus  we  may  approximate  the  first  term  of  the  expression  for  Po  by  1  and  approximate 
the  first  term  of  the  expression  for  Pi  by  0.  Then  we  have  the  error  probabilities  given 


approximately  by 


It  should  be  clear  from  (3.3)  that  the  quantity  (ni  -  no)/(&o  +  <*i)  determines  (approx¬ 


imately)  the  rate  at  which  Po  and  Pi  converge  to  0.  However,  if  we  assume  as  we  did  in 
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Chapter  2  that  ni  >  no,  then  this  is  equivalent  to  maximizing  the  quantity 


_  (Pi  ~  Mo)2 

(^o  +  <J\  )2 


(3.4) 


This  performance  measure  Si  determines  approximately  the  rate  at  which  Pq  and  Pj 
converge  to  0.  It  has  been  derived  also  in  [3]  using  Chernoff  bounds  as  a  nonlocal 
approach  to  the  signal  detection  problem  (1.11).  The  approach  here  is  different  in  that 
the  focus  is  on  signal  discrimination.  Note  that  the  performance  measure  Si  treats 
equally  the  asymptotic  variance  under  the  two  hypotheses,  whereas  this  is  not  the  case 
for  the  performance  measure  Si.  However,  in  the  remainder  of  this  chapter  we  shall  see 
that  Si  is  much  more  difficult  to  work  with  analytically  due  to  the  nonlinear  function  of 
<7§  and  a\  in  the  denominator. 


3.2  The  Optimal  Nonlinearity 

As  in  the  case  of  the  performance  measure  Si  of  the  last  chapter,  the  nonlinearity 
gi  which  maximizes  the  performance  measure  Si  is  also  given  by  the  solution  of  an 
integral  equation;  however,  the  integral  equation  for  this  case  is  nonlinear,  and  we  shall 
not  be  able  to  obtain  a  closed  form  solution.  The  technique  used  to  derive  the  integral 
equation  is  similar  to  that  used  earlier,  and  we  will  again  assume  that  the  processes  are 
m-dependent.  The  value  of  Si(g)  remains  unchanged  if  g  is  multiplied  by  a  constant, 
and  we  may  therefore  attempt  to  maximize  (ni  -  no)2  with  ( <r0  +  Ci)2  constrained  to 
be  constant.  Equivalently,  we  can  consider  the  following  optimization  problem  which 
involves  a  Lagrange  multiplier  A: 

maximize  ni  -  Mo  -  +  <*i)2  (3.5) 
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To  solve  this  problem,  define  J{g)  =  m{g)  -  nQ (g)  -  \[<r0{g)  +  <*\ ($)]  ,  and  let  6g  be 
an  arbitrary  continuous  function  satisfying  E,dg2(ATi)  <  oo  for  i  =  0,1.  A  necessary 
condition  for  g  to  solve  (3.5)  is  that  £j{g  +  €$g)|e_0  =  0.  To  show  the  steps  involved 
in  taking  the  derivative,  write  J(g)  in  the  following  form: 

J(9)  =  Mi (9)  -  Mo (g)  -  A <r£(s)  -  A ,o{(g)  -  2X\J<r^(g)<7{(g)  (3.6) 

Also  define  Ai(g,6g)  by  2Ai(g,6g)  =  -^<r;(g  +  €$ff)je=0.  Then  Ai(g,6g)  is  given  by 

m 

Mg,6g)  =  Eig(X1)Sg(X1)  +  ^2[Eig(Xl)Sg(Xj)  +  Eig(Xj)6g(Xl)] 

j=  i 

-(2m+l)EiS(A-1)E^g(X1)  (3.7) 


Now  we  have  for  the  contribution  of  the  last  term  on  the  right  of  (3.6) 


+  eSg)ff\(9  +  «^)|  _o  =  [*o($W(<7)]  *  [<rl{g)Ai{g,6g)  +  Aoig^gtfig)} 


The  contributions  of  the  other  terms  are  immediate.  Thus  we  have 

1/(S + rfS)|__o 

=  EiSg(Xo)  -  EoSg(Xo)  -  2X  ^1  +  Ao(g,6g)  +  ^1  +  A\{g,  6g)  (3.9) 


We  obtain  the  integral  equation  by  introducing  the  densities  /o,  fi,  /o  >  j  —  1  •  •  •  m,  and 
f(,j  =  1 . . . m  into  (3.9)  and  setting  the  result  equal  to  0.  This  yields 

0  =  J  6g(x)S^fi(x)- f0(x) -2A  ^1  +  g(x)f0(x)  +  J  K0{x,  y)g{y)dy^ 


-  2A  ^1  +■  g(x)fi(x)  +  J  I{l(x,y)g{y)dy  \dx 


(3.10) 


where  Ki  is  given  by 


Ki(x,y)  =  £][//(x,y)  +  //(y,x)]  -  (2m  +  1  )/<(*)/i(y).  (3.11) 

)•  i 


The  equation  (3.10)  holds  for  arbitrary  £y  iff  the  expression  in  braces  is  identically  zero. 
Thus  g  must  solve  the  integral  equation 

U<(')*(Ur)M)t(tA+,r')/,(,) -u  J  L(*,yW,)iy  (3.12) 


where  r  =  ( ai/<To )  and  the  kernel  L  is  given  by 

Llx  i/l  =  C1  +  r)KQ(x,y)  +  (1  +  t-1)/^,  y) 
*  'y)  (l  +  r)/0(*)  +  (l  +  r-i)/t(l) 


(3.13) 


Again  we  observe  that  A  determines  the  scaling  of  g.  Thus  the  particular  value  of  A  is 
not  significant  except  that  it  must  have  the  proper  sign  so  that  if  g  solves  the  integral 
equation  (3.12),  then  H\{g)  >  fM)(g)-  Consider  now 
My)  +  Ox  (a)]2  S  (1  +  r)al(g)  +  (1  +  r_1)<r{(<7) 

=  J  </(*){  [(1  +  r)fo(x)  +  (1  +  r“1)/1(*)]^(a?)  (3  14) 

J  [(1  +  r)A'0(x,y)  +  (1  +  r_1)  A'i(x,y)]y(y)dy}dx 


If  g  solves  the  integral  equation  (3.12)  for  A  =  A,  then  the  expression  in  the  braces  in 
(3.14)  reduces  to  /j(x)  -  /0(x),  and  thus  [cr0(p)  +  cr^y)]2  =  Mi  (y)  ~Mo(y)  >  0.  We  shall 
therefore  assign  to  A  the  value  \  and  henceforth  consider  the  integral  equation 

9(,)  =  (I  +  ‘  <3-15> 

Furthermore,  we  observe  that  if  g 2  solves  the  integral  .quation  (3.15)  then  52(52)  = 
Mx  (52)  —  Mo (52)1  a  result  which  is  similar  to  the  one  obtained  in  Chapter  2  for  the 
optimal  value  of  Si.  We  have  proved  the  following  theorem. 
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Theorem  7.  If  the  process  {X;}  is  m-dependent  under  both  H0  and  Hi,  then  a 
sufficient  condition  for  gi  to  maximize  S?  is  that  52  solve  the  integral  equation  (3.15). 
Furthermore,  if  52  solves  (3.15)  then  52(52)  =  Mi  (52)  -  Mo(fl2)- 

In  comparing  the  integral  equation  (3.15)  with  the  integral  equation  (2.19),  we 
note  first  of  all  that  (3.15)  is  nonlinear  because  of  the  fact  that  r  is  a  function  of  g.  Let  us 
consider  now  what  happens  when  r  varies.  If  r  is  very  small,  then  <7i  is  much  smaller  than 
(To,  and  thus  the  value  of  performance  measure  S2  is  very  close  to  that  of  the  performance 
measure  5o-  In  fact,  52  — *■  5o  as  r  — ►  0.  Now  observe  that  the  integral  equation  (3.15), 
when  rescaled,  converges  as  r  — ►  0  to  the  integral  equation  (2.19)  which  maximizes  the 
performance  measure  S\.  This  provides  us  with  some  insight  to  the  relation  between  the 
performance  measures  5o,  5i,  and  S2  and  the  role  that  r  plays  in  the  integral  equation 
(3.15).  We  observe,  for  example,  that  there  is  a  conflict  of  objectives  for  very  small 
r  in  that  the  value  of  the  performance  measure  S2  is  approximately  equal  to  that  of 
5o,  while  the  integral  equation  (3.15)  provides  a  nonlinearity  which  is  close  to  the  one 
which  maximizes  the  performance  measure  5i .  A  similar  conflict  occurs  u  r  approaches 
00,  with  the  roles  of  5?  and  5i  reversed.  Of  course,  there  is  no  conflict  of  objective  if 
5o  «  5i,  but  this  implies  that  r  «  1.  Thus  we  expect  that  r  will  have  a  “reasonable” 
value  on  the  order  of  one.  We  find  this  to  be  the  case  in  Chapter  5  where  a  numerical 
solution  to  the  integral  equation  (3.15)  is  found. 

3.3  The  Solution  of  the  Integral  Equation 

The  equation  (3.15)  is  nonlinear  because  r  is  a  function  of  5,  and  for  this  reason, 
finding  a  closed  form  solution  is  rather  difficult.  If,  however,  we  had  clairvoyance  to 
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know  the  correct  value  of  r,  then  we  could  find  the  solution  by  solving  a  linear  integral 
equation.  In  fact,  we  might  try  to  guess  the  value  of  r,  find  the  solution  of  the  resulting 
linear  integral  equation,  and  then  compute  r  to  verify  if  our  guess  was  correct.  This 
suggests  an  iterative  method  where  the  computed  value  of  r  from  the  previous  solution 
becomes  the  new  value  for  r  at  the  next  iteration  of  the  procedure.  This  method  is  used 
to  obtain  a  numerical  solution  to  (3.15)  in  Chapter  5;  it  is  found  that  the  successive 
values  of  r  do  in  fact  converge. 

Although  we  cannot  find  a  closed  form  solution  to  (3.15),  we  may  treat  r  as 
a  constant  whose  value  is  unknown,  and  thereby  extend  the  analysis  relating  to  the 
equation  (3.15).  If  we  make  the  substution 


h(x)  =  g{x)y/{  1  +  T)fo(x)  +  (1  +  ^-1)A(l), 
we  obtain  the  integral  equation 

*(*>  =  M*J '= ™X)  +  J  r(x,y)h(y)iy 

where  the  symmetric  kernel  Lm  is  given  by 

L*,g  j  _  (1  +  r)Ap(z,y)  +  (1  +  r~1)/?1(i,y) 

yJwT{x)wr(y) 

and  where  wT  is  defined  by 

Writ)  =  (1  +  r)/o(0  +  (1  +  r_1)/!(t). 


(3.16) 


(3.17) 


(3.18) 


For  a  given  value  of  r,  the  integral  equation  (3.16)  is  a  Fredholm  equation  of  the  second 
kind,  provided  we  have  the  conditions 

[/i(*)  ~  /o(*)l2 


Wr{x) 


•dx  <  00 


<«)  / 

( b )  JJ  | L'(x,y)\2dxdy  <  00. 
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These  conditions  imply  that  the  solution  h  is  square  integrable,  and  then  it  follows 
that  that  E,  ^2(Xi)  <  oo  for  i  =  0,1.  Note  that  we  do  not  require  the  condition  that 
fa(x)/  f\(x)  be  bounded  as  we  did  for  the  integral  equation  (2.19).  Condition  (a)  follows 
from  the  fact  that  |/i(x)  —  /o(x)j/u>T(x)  is  bounded  by  (1  +  r)-1  +(l  +  r-1)-1.  To  show 
that  condition  (b)  holds,  it  suffices  to  show  that 

2 

dxdy  <  oo  (3.19) 

since  we  may  then  apply  the  Minkowski  inequality.  If  all  the  joint  densities  involved 
have  the  expansion  (2.22),  then  the  inequality  (3.19)  follows  from  a  similar  argument  for 
the  case  of  the  kernel  K{  in  Chapter  2.  For  example,  consider  the  terms  of  the  form 

<  fo(x,y)fo(x,y) 
wr(x)wT(y)  ~  (1  +  r)2/o(x)/o(y)  ‘ 

It  was  shown  in  Chapter  2  that  such  terms  are  integrable.  Thus  condition  (b)  holds. 
Since  the  integral  equation  (3.16)  has  a  symmetric  kernel,  the  Hilbert-Schmidt  theory 
applies  as  in  Chapter  2.  If  the  eigenvalues  and  eigenvectors  of  the  kernel  (3.17)  are 
denoted  by  {An}  and  {<£n},  then  a  solution  /12  of  the  integral  equation  (3.16)  has  the 
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The  solution  is  unique  if  and  only  if  An  ^  1  for  all  n.  Since  we  do  not  have  clairvoyance 
to  know  the  true  value  of  r,  the  solution  (3.20)  is  purely  academic. 


If  the  process  is  iid,  then  the  kernel  L  from  (3.13)  has  the  simpler  form 


T(~  v\  _  (*  +  r)/o(ar)/o(y)  +  (f  +  T~:)/o(*)/i(y) 

V  (l  +  r)/0(x)  +  (l  +  r-M/i(x) 


and  the  integral  equation  (3.15)  has  the  solution 


9»d(x) 


Bpfo(x)  -f  Bifi(x) 

(l  +  r)/0(x)  +  (l  +  r-i)A(x) 


(3.21) 


where  Bp  =  [(1  +  r)/x0  —  1]  and  B\  =  ((1  +  r~1)n\  +  1].  There  are  three  unknown 
quantities  in  the  expression  (3.21):  r,  /io*  and  fx\.  These  quantities  can  be  found  by 
solving  the  following  system  of  nonlinear  equations: 


Mo 

Mi 


=  J  9iid(x)fo(x)dx 

*  j  9iid(x)fx(z)dx 


(3.22) 


Note  that  gud(x )  +  C  solves  (3.15)  for  an  arbitrary  constant  C.  We  may  therefore  take 
an  arbitrary  value  for  either  hq  or  In  fact,  with  r  fixed  the  first  two  equations  in 
(3.22)  are  linear  in  no  and  and  are  singular.  Therefore  the  system  (3.22)  does  not 
have  a  unique  solution. 


3.4  The  Performance  Measure  S3 

The  performance  measures  So  and  S\  which  were  derived  in  Chapter  2  have  the 
undesirable  feature  that  they  treat  unequally  the  performance  under  the  two  hypotheses. 
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and  yet  they  are  relatively  nice  regarding  analysis  since  they  lead  to  linear  integral 
equations.  On  the  other  hand,  the  performance  measure  S2  treats  both  hypotheses 
equally  but  leads  to  a  nonlinear  integral  equation.  We  are  led  to  consider  also  the 
performance  measure 


q  _  b*i  ~  Mo]2 
3  +  °\ 


(3.23) 


which  treats  both  hypotheses  equally  and  leads  to  a  linear  integral  equation  as  well.  This 
performance  measure  is  derived  in  this  section  by  considering  Chernoff  bounds  for  the 
error  probabilities,  and  is  actually  a  slight  modification  of  the  method  of  Sadowsky  and 
Bucklew  [3]  in  which  they  derived  the  performance  measure  52. 

If  the  test  statistic  Tn  has  a  normal  distribution  with  mean  n/u,  and  variance  n<j\ 
then  these  are  the  Chernoff  bounds: 


P[Tn  >  n-y]  <  exp[-n/,(7)]  if  m  <  7 
P[Tn  <  «7]  <  exp(-n/i(7)]  if  m  >  7 


(3.24) 


where 


f<(7)  = 


(/*«  ~  7)2 
2*? 


(3.25) 


The  Chernoff  bounds  are  asymptotically  tight  in  the  sense  that 


lim  -  —  log  P[Tn  >  07]  =  /,( 7)  if  Hi  <  7 

n—00  n 

lim  --log  P[Tn  <  T17]  =  Ii{  7)  if  Mi  >  7 

n-»oo  71 

and  they  henc~  can  provide  good  approximations  to  the  error  probabilites  if  n  is  large. 
Of  course,  if  the  distribution  of  Tn  is  only  approximately  normal,  then  the  bounds  given 
by  (3.24)  are  only  approximations  of  the  true  Chernoff  bounds.  Nevertheless,  we  shall 
proceed  under  the  assumption  that  such  approximations  are  acceptable.  From  (3.24)  we 


37 


see  that  if  n-y  is  the  threshold  and  /io  <  7  <  Hi,  then  I,  determines  (approximately)  the 
bound  for  the  error  probability  P{.  Since  a  larger  value  for  li  results  in  a  smaller  bound 
for  Pi,  it  is  desirable  to  make  both  Jo  and  I\  as  luge  as  possible.  It  is  obvious  that  U 
is  a  convex  function  of  7  which  takes  its  minimum  value  at  7  =  Hi.  Thus  as  7  increases 
from  /io  to  hi,  ib  increases  and  I\  decreases.  Sadowsky  and  Bucklew  [3]  proceeded  from 
this  point  by  maximizing  the  min(/o,/i)  and  obtained  the  result  that 


Si  =  max  min  Id 7). 

i»0,l  v  ’ 


It  is  fairly  straightforward,  however,  to  show  that 


53  =  + 

<<0<7<Mi 

and  this  justifies  the  use  of  performance  measure  S3. 

,  The  following  method  for  maximizing  S3  is  very  similar  to  the  method  in  Chapter 
?.  for  maximizing  Sj,  and  we  must  assume  that  the  processes  are  m-dependent.  We  know 
that  g  maximizes  S3  if  and  only  if  g  maximizes  Jz(g)  =  Hi(g)  -  fMi(g)  -  ><[<ro(g)  +  ^(g)], 
where  A  is  a  Lagrange  multiplier.  The  condition  that  J(-)  vanish  at  g  leads  to  the  integral 
equation 

2Aj(*)  =  /^(xj  ^  /°(x)  ~2XJ  M(X' M,)dv  (3'26) 

where  the  kernel  M  is  given  by 


M(x,y) 


Ka(x,y)  4 •  K\{x,y) 
/o(*)  +  /i(») 


The  functional  J3  is  convex  in  g,  provided  A  >  0,  so  that  the  condition  that  g  solve  the 
integral  equation  (3.26)  is  also  a  sufficient  condition  for  g  to  maximize  S3.  We  therefore 
take  A  =  A  and  the  integral  equation  (3.26)  becomes 

5(i) = mTSmJ)  M,)iy ■  (3-27) 


38 


We  can  also  show  easily  that  if  q3  solves  the  integral  equation  (3.27),  then  <75(53)  + 
*1(53)  =  Ml (53)  -  7*0(53)  so  that  the  optimal  value  of  53  is  £3(53)  =  7*1(53)  -  7*0(53)- 
These  results  are  summarized  in  Theorem  8. 

Theorem  8.  If  the  process  { Xi }  is  m-dependent  under  both  Ho  and  Hi,  then  a 
sufficient  condition  for  53  to  maximize  S3  is  that  53  solve  the  integral  equation  (3.27). 
Furthermore,  if  53  solves  (3.27)  then  £3(53)  =  7*1(53)  -  7*0(53)- 


The  integral  equation  (3.27)  can  be  transformed  into  an  integral  equation  with 


a  symmetric  kernel  by  making  the  substitution  h(x)  =  g(x)y/ fo(x)  +  f\(x)  so  that  the 
Hilbert-Schmidt  theory  applies  as  before.  We  shall  not  pursue  this  further.  We  shall, 
however,  proceed  to  And  the  optimal  nonlinearity  for  iid  processes.  If  the  processes  are 
both  iid,  then  the  kernel  M  has  the  form 

- A<*> +  /.(*) - 

and  the  integral  equation  gives  us  immediately  the  form  of  the  iid  solution: 

Bofo(x)  Bifi(x) 


ffiid(x)  = 


(3.28) 


/o(*)  +  /l(l) 

where  Bo  =  po-1  and  B\  —  p\  + 1.  To  find  the  unknown  constants  B0,  B\ ,  we  subtitute 
for  gad  in  the  linear  equations 


7*o  =  Bq  +  1 


=  / 


9iid(x)/o(x)dx 


(3.29) 


7*1  =  Bi  -  1  =  j  gnd(x)fi(x)dx. 

The  system  (3.29)  is  in  fact  singular,  so  that  we  may  take  either  Bo  or  B\  to  be  arbitrary. 
Therefore  we  shall  arbitrarily  take  Bo  =  0  and  this  gives  us  the  value 

/o(*)/i(») 


B 1 


=  [/ 


1  -1 


fo(x)  +  fl(x) 


dx 


(3.30) 
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Thus  the  iid  solution  has  been  determined  explicitly. 


3.5  Extension  to  <^-Mixing  Processes 


The  results  pro.ed  in  Section  2.4  are  also  true  for  the  performance  measures  52 
and  Sz.  Because  of  the  similarity  of  these  two  performance  measures,  the  proof  for 
either  case  is  nearly  identical;  therefore  only  the  results  for  52  will  be  stated  and  proved. 
The  notation  is  as  follows:  Define  the  density  tv  =  £(/o  +  fi)  and  let  L2(w)  be  the 
Hilbert  space  of  Borel  functions  g  such  that  J  g2(x)tv(x)dx  <  oo  with  the  inner  product 
{ g,h )  *  J g(x)h(x)tv(x)dx.  Note  that  g  6  L2(tv)  implies  that  E,  g2(A’i)  <  oo  for  t  =  0, 1. 
Let  <rlm{g)  be  defined  by  (2.28)  and  define 


|>o,m(0)  +  <r1>m(s)] 


(3.31) 


We  denote  by  C  the  class  of  joint  densities  which  have  a  diagonal  expansion,  as  in 
Section-2.4. 


Lemma  9.  The  functional  S ^  is  continuous. 

Proof.  It  was  shown  in  the  proof  of  Lemma  4  that  the  numerator  of  (3.31 )  is  continuous 
and  that  <72m(g)  is  continuous.  Since  additions,  square  roots,  divisions,  etc.  preserve 
continuity,  it  follows  that  5 ^  is  continuous.  D 


Lemma  10.  If  the  joint  densities  //,  t  =  0, 1,  j  =  1,2,...,  are  in  C,  then  the 
functional  5i  is  continuous. 

Proof.  The  proof  consists  of  showing  that  S{m\g)  converges  to  5t(<7)  uniformly  for 
g  €  L2{  w).  Define  &  to  be  the  class  of  all  functions  g  6  L2(w )  such  that  E,  g(Xi)  =  0  and 
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E,  g2{X\)  =  1.  From  the  proof  of  Lemma  5,  it  follows  that  <r2m(g)  converges  uniformly 
for  g  €  Q% .  Now  suppose  that  g  is  an  arbitrary  nonconstant  element  of  L2(w).  Then  there 
exist  constants  a<  and  6j  such  that  gt  -  a*g  +  6i  €  Si ,  and  obviously  trf  „,($<)  =  a]o}m(g) 
for  all  ir*  and  of (&)  =  aftrf(g).  Let  c  =  |ai/ao|-  Then  wc  can  write 


|s,(j)  -  si"'(s)|  « 


Pi(gi)  -  Po(gi)  2  _ 

ccr0(go)  +  ^i(5i)J  [c| 


/ii(<7i)  -Mo(gi) 


Wtfo)  +  fo]  +  [oi(5i)  +  fi]. 


(3.32) 


where  (o  and  (i  converge  to  0  uniformly  for  g  €  I2  (in)  as  m  -*  oo.  The  righthand  side 
of  (3.32)  is  continuous  as  a  function  of  c  for  c  €  [0,  oo)  and  approaches  0  as  c  approaches 
oo.  Thus  for  e<j  and  ei  fixed,  the  righthand  side  attains  its  maximum  as  c  varies,  and 
this  maximum  converges  to  0  as  Co  and  ei  approach  0.  Hence  convergence  of  Sj”1*  is 
uniform.  0 


We  are  now  ready  to  prove  the  main  result. 

Theorem  11.  If  the  joint  densities  f{,  i  =  0,1,  j  =  1,2,...,  are  in  C,  then  there 
exists  a  function  g2  €  L2(w)  which  maximizes  S2.  If  g2m^  solves  the  integral  equation 
(3.15),  then  the  sequence  {52(p2m))}  converges  to  So (<72)  as  m  -*  00. 


Proof.  Define  S'  to  be  the  subset  of  L2(w)  consisting  of  the  vectors  g  satisfying 
f  g(z)2w(x)dx  =  1.  Since  S2  is  continuous  and  the  set  S'  is  compact,  there  exists  an 
element  g2  of  S'  such  that  S2  achieves  its  maximum  value  on  the  set  S'  at  g2.  If  g  is  any 
nonconstant  element  of  L2{w),  then  there  exist  constants  a  and  b  such  that  ag  +  6  6  S'- 
But  S2(g)  =  S2(ag  +  b )  <  52(52)-  Thus  g2  maximizes  S2 

Let  e  >  0.  By  the  proof  of  Lemma  10,  converges  uniformly  to  S2.  Thus 
there  exists  an  integer  M  such  that  for  every  m  >  M  and  every  g  €  L2(w)  we  have 
|5<m)(5)  —  5a(<7)|  <  e.  Let  m  >  M  be  fixed.  If  S2m\g2m ^)  <  S2(g2),  then  we  must  have 
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S2m)(gi)  <  S(2m\g(2m))  <  S2(g2).  Otherwise,  we  have  S2(g{2m) )  <  S2(g2)  <  S2m)(g(2m>). 
In  either  case,  -  52(02)!  <  and  this  implies  that  |52(02m')  "  £?(02)|  <  2t. 

D 
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CHAPTER  4 


MINIMAX  ROBUSTNESS 


4.1  The  Robustness  Problem 

When  the  actual  probability  distributions  of  the  observed  processes  are  precisely 
the  same  as  the  distributions  which  were  assumed  in  deriving  a  particular  test,  then 
this  is  referred  to  as  a  matched  situation.  The  performance  of  a  test  in  the  matched 
situation  is  certainly  an  important  consideration.  However,  also  of  great  importance  is 
the  performance  of  the  test  under  the  mismatched  situation,  where  the  actual  probability 
distributions  are  close  to  but  slightly  different  from  the  assumed  distributions.  If  a  given 
decision  rule  performs  relatively  well  in  the  mismatched  situation,  as  compared  to  the 
matched  situation,  then  such  a  decision  rule  is  said  to  be  robust.  It  is  the  purpose  of  this 
chapter  to  address  the  issue  of  robustness  in  relation  to  the  performance  measures  S\, 
S3  and  the  corresponding  optimal  test  statistics  as  given  in  the  preceding  chapters. 

To  begin,  we  must  first  make  more  precise  mathematically  the  discussion  of  the 
preceding  paragraph.  One  approach  to  robustness  which  has  become  very  popular  and 
which  will  be  considered  here  is  minimax  robustness,  a  game  theoretic  approach.  Define 
the  Qq  and  Qi  to  be  classes  of  distributions  which  are  possible  under  the  hypotheses 
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Hq  and  Hi,  respectively.  These  uncertainty  classes  axe  to  contain  the  “nominal”  distri¬ 
butions  (those  distributions  which  are  assumed  initially)  as  well  as  those  distributions 
which  are  only  slightly  different  from  the  nominal  distributions.  Now  define  the  least 
favorable  distributions  Fq  6  Qo  &nd  F*  G  Qi  to  be  the  distibutions  which  together  with 
the  nonlinearity  g *  form  a  saddle  point  for  the  performance  measure  5  as  follows: 

S(g,FZ,Fn  <  S(g‘,F^Fn  <  S(g’,F0,Fi)  (4.1) 

where  g  is  any  other  allowable  nonlinearity  and  Fo  and  Fi  are  arbitrary  distributions 
from  the  classes  Qq  and  Q\.  In  this  case  gm  is  called  a  minimax  robust  nonlinearity,  and 
has  the  property  that  for  any  pair  of  distributions  in  Qo  and  Q i,  the  value  of  5  evaluated 
at  g*  is  guaranteed  to  be  at  least  S(gm,FQ,F”). 

The  idea  of  this  approach  is  like  that  of  a  game  in  which  nature  chooses  the  distri¬ 
butions  F0,  F\  out  of  the  classes  (2oi  Qi  and  the  human  player  chooses  the  nonlinearity 
g ,  the  performance  measure  S(g,Fo,Fi)  being  the  payoff.  The  first  inequality  in  (4.1)  is 
usually  not  difficult  to  show.  Indeed,  if  the  least  favorable  distributions  are  known  then 
finding  the  nonlinearity  g *  is  merely  the  problem  considered  in  Chapter  2  or  Chapter  3. 
What  is  usually  more  difficult  is  finding  the  least  favorable  distributions  Fq,  F*  and 
showing  the  second  inequality  in  (4.1).  Obviously  gm,  Fq,  and  Ff  solve  the  minimax 
problem 

min  max5(flr,Fo,Fi)  (4.2) 

Fo.Fi  g 

and  in  fac*  solving  (4.2)  is  the  simplest  way  to  find  F0*  and  F*,  if  they  exist.  That  the 
solution  gm,  Fq,  F*  of  (4.2)  satisfies  the  left  inequality  in  (4.1)  is  obvious,  and  the  main 
task  in  proving  a  result  in  robustness  is  showing  that  such  a  solution  also  satisfies  the 
right  inequality.  Equivalently,  one  might  try  to  show  that  gm,  Fq  ,  F*  solve  the  maximin 
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problem 

max  min  S(g,F0,Fi)  (4.3) 

9  Fo,Ft 

since  the  solution  of  (4.3)  satisfies  the  right  inequality  in  (4.1). 

If  one  defines  a  metric  on  some  class  M  of  probability  distributions,  then  a  quite 
natural  way  to  define  an  uncertainty  class  Q  about  a  nominal  distribution  F  is  to  include 
all  distributions  in  M  which  are  at  a  distance  c  or  Less  from  F.  Such  a  definition  might 
be  appropriate  for  minimax  robustness  if  one  wishes  to  exploit  continuity  properties  of  a 
performance  measure,  since  by  continuity  there  will  be  a  small  change  in  the  performance 
if  the  distance  between  the  distributions  is  small.  The  e-contamination  class  which 
we  shall  consider  here  is  useful  in  a  different  sense  and  is  defined  by  Q  =  {F  :  F  = 
(1  -  e)F  +  eH,H  €  M}.  Evidently,  every  distribution  in  the  class  Q  is  a  mixture 
of  the  nominal  distribution  F  and  some  unknown  distribution  H  with  weights  (1  -  e) 
and  e.  The  corresponding  physical  interpretation  is  that  an  observation  comes  from  the 
distribution  F  with  probability  (1  -  e)  and  from  the  distribution  H  with  probability  e. 
Thus  if  F  is  a  univariate  distribution  and  the  process  is  iid,  then  out  of  n  observations, 
approximately  (1  -  e)n  will  be  from  the  distribution  F  and  approximately  en  will  be 
“corrupted”  observations  from  the  distribution  H.  Therefore  in  the  iid  case  such  an 
uncertainty  class  has  a  pleasing  physical  interpretation.  However,  if  the  process  is  not 
iid,  as  we  wish  to  assume,  then  F  is  an  n  dimensional  distribution  and  the  interpretation 
is  that  with  probability  e  the  distribution  is  completely  unknown.  This  interpretation 
is  not  particularly  desirable,  aM  we  shall  therefore  modify  this  particular  uncertainty 
model. 

Since  the  performance  measures  we  have  derived  involve  only  the  marginal  and 
bivariate  joint  densities,  we  shall  attempt  to  define  uncertainty  classes  which  involve 
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only  these  densities.  In  particular,  we  shall  assume  that  the  nominal  distribution  for  our 
uncertainty  class  Q  is  iid  with  marginal  density  /.  The  class  Q  is  then  defined  to  contain 
all  stationary  process  distributions  F  such  that  the  marginal  density  corresponding  to  F 
is  contained  in  an  e-contamination  class  about  the  nominal  /,  and  such  that  the  bivariate 
joint  distributions  satisfy  the  condition 

~  |Covto(*1),a(*w)l| 

/  v/Var}(JfI)Var5(X/+1)  "  1 

where  g  ranges  over  ail  measurable  functions  satisfying  E  <r(.Yi)  <  oo.  Since  we  assume 
stationarity,  the  denominator  is  actually  just  Var  <7(-Xi).  The  condition  (4.4)  is  our  way 
of  allowing  for  some  uncertainty  in  the  dependency  structure  of  the  process,  so  that  not 
every  process  in  the  class  fi  will  be  iid.  Thus  the  uncertainty  class  Q  is  specified  by 
giving  the  nominal  density  /,  the  parameter  e,  and  the  sequence  {r,}. 

In  the  analysis  that  follows,  the  least  favorable  marginal  and  bivariate  joint  dis¬ 
tributions  are  derived,  and  two  issues  regarding  these  least  favorable  distributions  must 
be  addressed.  First,  it  must  be  shown  that  there  do  in  fact  exist  stochastic  processes 
having  the  prescribed  distributions,  and  second,  it  must  be  shown  that  the  processes 
satisfy  a  mixing  condition,  so  that  the  central  limit  theory  may  be  applied  as  in  the 
preceding  chapters.  The  necessary  results  for  these  two  issues  have  been  derived  by  Sad- 
owsky  [9]  and  we  shall  adapt  them  as  needed.  For  a  fixed  marginal  distribution  function 
F ,  the  least  favorable  bivariate  distribution  functions  are  given  by  (F*  being  the  joint 
distribution  function  for  Xi  and  Xj+i) 

F3{x,y)  =  (1  -  rj)F{x)F(y)  +  rjF(x  A  y),  j=  1,2,...  (4.5a) 

where  x  A  y  is  the  minimum  of  x  and  y.  If  the  distribution  function  F  has  a  density  /, 
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then  we  may  write  for  the  bivariate  densities 


f}(x,y)  =  (l-rj)f(x)f(y)  +  rj6(x-y)f(x ),  j  =  1,2,....  (4.56) 

It  can  easily  be  seen  that  for  such  distributions  equality  is  achieved  in  (4.4)  for  any  g 
such  that  E<72(Xi)  <  oc.  When  cr2(g)  is  computed  using  the  bivariate  distributions 
given  by  (4.5)  and  R  =  £  tj  <  oo,  then  we  have 

02(g)  =  (1  +  2R)'Vax  g(Xi)  (4.6) 

which  depends  only  on  the  marginal  density.  Furthermore,  if  we  define  Q  to  be  the  class 
of  all  measurable  functions  g  such  that  Var  <7(Xi)  =  1,  then  the  supremum  of  cr2  over  Q 
is  achieved  by  every  g  in  Q  and  is  equal  to  (1  +  2 R).  That  a  process  exists  which  has 
distributions  given  by  (4.5)  is  shown  in  two  different  constructive  proofs  by  Sadowsky 

OO 

[9].  Let  {0,  }  be  a  sequence  of  nonnegative  real  numbers  such  that  ^  0{  =  1.  Then 

1=1 

OO 

in  these  two  constructions  the  r-sequences  are  given  by  r_,  =  ^  9m9J+m  in  one  case 

771  =  0 

OO  ^  ^  . 

and  by  r  i-  T.  in  the  other.  Thus  in  Sadowsky’s  constructions,  arbitrary 

m 

m=j+l 

r-sequences  may  not  be  possible.  Note  that  if  the  sequence  {0,)  takes  positive  values 
only  for  i  =  1,2, ...,m,  then  the  processes  in  either  construction  are  m-dependent. 

The  following  central  limit  theorem,  from  [9],  is  appropriate: 


Theorem  12.  Let  0  <  6  <  oo  and  set  q  =  6/(2  +  6)  if  6  <  oo  or  q  =  l  if  <5  =  oo. 
Let  {Xj}  be  a  stationary  strong  mixing  process  which  satisfies  (1.9)  with 

00 

a.)  <  oo,  (4.7) 

;=i 
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and  let  g  be  a  florei  function  such  that  E  |$(Xi)|2+5  <  oo  if  6  <  oo  or  such  that  |g(Xi)| 
is  almost  surely  bounded  if  6  =  oo.  Then  the  sum  &2(g)  defined  in  (1.8)  converges, 
and  [T„(X)  -  np{g)]f  y/ncr2(g)  converges  in  distribution  to  a  standard  normal  random 
variable,  provided  <r2{g)  >  0. 

Further  results  from  [9]  show  that  the  process  is  strong  mixing  and  the  condition  (4.7) 
is  satisfied  if  there  exist  constants  K  >  0  and  c  >  0  such  that  r:  <  Kj~(l+'i*+ «)/«,  Thus 
if  the  r-sequence  is  dominated  by  an  exponential  sequence,  the  condition  (4.7)  holds.  To 
apply  the  theorem,  then,  it  remains  only  to  show  that  E|<7(A'i)|2+<5  <  oo,  and  this  holds 
in  particular  if  g  is  bounded.  Thus  the  two  issues  mentioned  above  are  resolved. 

Because  the  condition  (4.4)  is  defined  in  terms  of  a  supremum  over  second-order 
functions  g ,  it  is  not  directly  obvious  whether  a  given  bivariate  distribution  satisfies  such 
a  bound.  In  order  to  determine  whether  the  bound  holds  for  a  given  bivariate  density 
fj,  it  is  useful  to  consider  the  diagonal  diagonal  expansion  (2.22),  if  it  exists.  Any  g 
which  satisfies  / g2(x)f(x)dx  <  oo  has  an  expansion  5(1)  =  J2nbndn(x),  and  for  such 
an  expansion,  we  have  as  well  that 

J  92{x)f(x)dx  = 

J  n 

JJ  9(x)g(y)fj(x,y)dxdy= 

J  g(x)f{x)dx  =  b0. 

These  expressions  imply  that 

lCov(g(X1),g(.YJ+i)]| 

Var  g{Xi) 
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Since  Cov[g(Xi),g(Xj+i)]/Var  g(Xi)  is  invariant  under  the  scaling  of  g ,  we  can  assume 
without  loss  of  generality  that  =  1,  so  that  the  denominator  of  (4.8)  is  equal 

to  1.  Then  it  is  obvious  that  the  sup  in  (4.4)  is  obtained  by  0,  where  i  is  such  that 
|aj^|  =  max{jon^|,n  >  1}.  If  the  orthonormal  functions  {0n}  are  polynomials  (that 
is,  p  is  in  the  class  C  defined  in  Chapter  2)  then  this  maximum  coefficient  occurs  as 
either  or  a'p .  To  show  this,  we  require  a  fact  from  [12]  that  for  any  such  diagonal 
expansion  in  which  the  orthonormal  functions  are  polynomials,  there  exists  a  probability 
density  function  hj  having  support  in  the  interval  [—1, 1]  such  that  a'P  =  tnh}(t)dt. 
Then  for  n  >  2  it  is  obvious  that 


|<4J)|  $  J  J*|nM*V*  ^  f  i  t2hj(t)dt  =  |<4J)|. 

so  that  the  assertion  holds.  Let  6»  =  /  xnf(x)dx,  =  //  xmynfi{x,  y)dxdy,  and 
a 2  =  6  ~  fi*  Then  for  this  case  axj)  and  are  given  by 


u)  _  cii^  ~ 

/r2 


ai  = 


<£>  = 


n  +  2o2(66  -  6)Cff  +  (66  ~  6)2C,7  -  (£2  ~  66) 
*46  +  2^(66  -  6)6  +  (66  -  6)26  -  (62  -  66)  ' 


4i) 


In  the  sections  that  follow,  we  will  consider  the  robustness  problem  (4.1)  where 
the  performance  measure  5  is  either  S\  or  S3.  We  assume  that  under  the  hypothesis  Ht, 
the  true  distribution  of  the  observed  process  is  in  the  class  Qi ,  which  is  defined  as  above 
by  the  nominal  marginal  density  /<,  the  parameter  e,,  and  an  r-sequence  which  has  the 
sum  R{.  For  given  marginal  densities,  it  will  be  shown  that  the  bivariate  distributions 
defined  in  (4.5)  are  in  fact  least  favorable,  and  this  reduces  the  problem  (4.1)  to  one 
which  involves  only  the  marginal  densities.  We  now  give  the  least  favorable  marginal 
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densities,  which  we  will  call  the  Huber-Strassen  least  favorable  densitites.  These  axe 


,  )  f  (1  -  <b)/o(*)  if  /i(*)//o(x)  <  c" 

™  "1  (1/0(1 -«b)A(*)  if  A(*)//o(*)  >  c" 


Pi(*) 


■{: 


(4.9) 


(1  -  «i)A(*)  if  A(*)//o(x)  >  c" 

</(l  -  €i)/o(x)  if  A(x)//0(x)  <  c' 

where  the  constants  c'  and  c"  are  chosen  snch  that  the  functions  are  valid  probability 
densities  (i.e.  they  integrate  to  1).  The  Huber-Strassen  densities  have  appeared  fre¬ 
quently  as  the  solution  to  various  minima*  robustness  problems.  Lemma  13  is  the  basis 
for  many  such  applications. 


Lemma  13.  For  i  =  0, 1,  let  Pi  be  the  class  of  all  probability  density  functions  of  the 
form  f  =  (1  -  €i)Ji  4-  Cih,  where  fi  is  fixed  and  h  is  arbitrary,  and  let  ¥  be  any  convex 
function.  If  po  and  p\  are  the  Huber-Strassen  least  favorable  densities  corresponding  to 
fo  and  A,  then  the  inequality 


Po(z)dx  < 


fo(z)dx 


holds  for  all  margined  densities  fo  G  Po  a nd  A  €  Pi- 

Proof.  It  has  been  shown  in  [7]  that  the  least  favorable  densities  in  terms  of  risk  for 
the  classes  Po  and  P\  are  the  Huber-Strassen  densities.  The  proof  then  follows  as  a 
|  corollary  to  Lemma  1  in  [15].  Q 


In  addition  to  the  e-contamination  classes,  the  Huber-Strassen  densities  are  also 
least  favorable  in  terms  of  risk  for  at  least  three  other  uncertainty  classes:  the  total 
variation  classes  [7],  bounded  classes  [17],  and  p-point  classes  [18].  Thus  Lemma  13 
holds  as  well  if  the  classes  Po,  Pi  are  both  of  one  of  these  other  three  classes. 

The  main  result  of  this  chapter  is  stated  in  the  following  theorem. 
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Theorem  14.  The  least  favorable  process  distributions  Fq,  F’  in  the  classes  Q0,  Qi 
are  such  that  their  marginal  densities  are  the  Huber-Strassen  densities  (4.9)  and  their 
bivariate  joint  distributions  are  defined  by  (4.5). 

4.2  Robustness  for  Si 

In  the  first  section  the  idea  of  minimax  robustness  was  discussed  and  the  problem 
(4.1)  posed  without  reference  to  a  particular  performance  measure.  We  are  now  ready 
to  find  the  solution  to  (4.1)  with  the  performance  measure  S  taken  to  be  5i  as  defined 
in  (2.6).  Our  first  task  is  to  show  that  for  arbitrary  but  fixed  marginal  densities,  the 
bivariate  disributions  defined  by  (4.5)  are  in  fact  least  favorable.  For  i  =  0, 1,  assume 
that  the  marginal  density  fr  is  fixed  and  denote  by  H-t  the  subset  of  Qt  containing  all 
the  distributions  F;  which  agree  with  the  fixed  marginal  density.  Let  F*  denote  any 
such  distribution  in  7£,  having  bivariate  distributions  defined  by  (4.5).  To  show  that  the 
distributions  Fq  and  F*  are  least  favorable,  we  must  show  the  inequalities  (4.1).  From 
the  result  (4.6)  and  the  fact  that  Si  is  invariant  under  the  scaling  of  g ,  it  is  clear  that 
the  left  inequality  is  an  equality  for  any  allowable  nonlinearity  g.  From  (4.4)  it  is  clear 
that  the  inequality 

<r\{g)  <  (1  +  2tfj)Van  g(Xx)  (4.10) 

always  holds  for  any  distributions  in  the  uncertainty  class  Qj.  But  (4.6)  implies  that 
under  the  distributions  Fq  and  F*,  equality  is  obtained  in  (4.10)  for  arbitrary  g ,  and 
in  particular  equality  holds  for  gm.  These  facts  imply  that  the  right  inequality  in  (4.1) 
holds.  Thus  Fq  and  F*  are  the  least  favorable  distributions  in  the  classes  TZq  and  Hi. 
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Now  suppose  we  define  a  new  performance  measure  Si  by 


Si{g,fo,fi ) 


(4.11) 


where  we  introduce  the  new  notation 

H(9‘J)  *  f  g(x)f(x)dx 
<r2(g;f)  =  j g2{x)f{x)dx  -  [n(g;f)]2. 

Such  a  performance  measure  depends  only  on  the  marginal  densities.  Suppose  that  we 
find  g*,  /o ,  /*  which  form  a  saddle  point  for  Si,  with  /J  and  /*  being  allowable  marginal 
densities  for  the  classes  Qq  and  Qi-  Thus 


SligJoJi)  <  Slig'JZJ;)  <  Slig-JoJi).  (4.12) 


Now  if  we  consider  the  performance  measure  Si,  and  gm,  F£,  F*,  where  F‘  is  a  distri¬ 
bution  having  marginal  density  /"  and  bivariate  distributions  defined  by  (4.5),  we  find 
that  we  have  a  saddle  point  for  the  classes  Qo  and  Qi.  Indeed,  we  find  that 


Si(5,i^,fT) 


Si(gJSJi)  Si(g'JZJi') 

(1  +  2J2i)  —  (1  +  2i?i) 


Si(p“,f0‘,f;). 


Furthermore,  we  have 


Si(9-,fz,f;) 


Siig'JSJn  , 

(l  +  2i?i)  — 


Si(gm,fo,fi) 

(1  +  2f2i) 


=  Si(g‘,FiFi)<Si(g‘,F0,Fi) 


where  F[  denotes  the  process  distribution  having  fi  as  the  marginal  density  and  bivariate 
distributions  given  by  (4.5),  and  Fi  denotes  an  arbitary  process  distribution  which  has 
the  marginal  density  Our  conclusion  now  is  that  we  need  only  solve  the  problem 
involving  the  performance  measure  Si  and  the  marginal  densities;  that  is,  we  must  find 
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the  least  favorable  marginal  densities  and  /*  which  together  with  gm  solve  the  problem 
(4.12).  Then  the  least  favorable  process  distributions  are  such  that  they  agree  with  the 
marginal  densities  ,  /*  and  have  bivariate  distributions  given  by  (4.5). 

Our  method  will  be  as  follows:  First  we  find  the  solution  gm,  f£,  f*  to  the  minimax 
problem  (4.2),  where  and  /"  are  in  the  classes  Q!q  and  Q[  which  we  define  to  be  the 
classes  of  all  marginal  densities  which  are  derived  from  the  classes  Q0>  Qi-  Recall  that  Q\ 
is  an  e-contamination  class  with  nominal  univariate  density  /,  and  parameter  €,.  If  the 
problem  (4.1)  has  a  solution,  then  necessarily  it  must  be  g *,  /£ ,  /*.  Thus  at  this  point 
we  have  likely  candidates  for  the  robust  nonlinearity  and  the  least  favorable  densities. 
The  second  step  is  to  show  that  the  right  inequality  in  (4.1)  is  satisfied;  that  is,  we  must 
show  that 

W,/o,/D  =  inf  Sx{g\fQJ i).  (4.13) 

/Oi/l 

For  given  marginals  /o,  /i,  the  optimal  nonlinearity  g  is  given  by  the  solution  of 
the  integral  equation  (2.19)  with  m  =  0,  and  it  is  easily  verified  that  a  solution  is  given 
by  g(x)  =  ~[/o(*)//i(*)]-  By  Theorem  3,  we  have  that 

Sx(gJoJi)  =  J g(x)[Mx)  -  fo(x)\dx  =  J  -  1  (4.14) 

Thus  for  the  first  step,  solving  (4.2),  we  must  minimize  the  rightmost  integral  in  (4.14). 
Lemma  13  applies  with  l®,(z)  =  x-1,  which  is  convex.  Thus  our  candidates  for  the 
least  favorable  marginal  densities  are  the  Huber-Strassen  densities  corresponding  to  the 
nominals  /o  and  f\. 

We  must  now  show  that  the  right  inequality  in  (4.1)  is  satisfied  by  g"  = 
where  /q  and  /“  are  the  Huber-Strassen  densities.  A  complete  proof  of  this  result  has 
been  published  in  [16],  and  therefore  only  a  sketch  of  the  proof  will  be  given  here. 
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Lemma  15,  whose  proof  is  given  in  [13],  will  be  used  here  as  well  as  in  the  next  section 
to  show  that  certain  functions  are  convex. 

Lemma  15.  If  »i  >  0,  v?  >  0,  and  0  <  a  <  1  then 

n2  ,.2 

<  a— ^  +  (1  -  a)-2-. 

v2 

By  virtue  of  Lemma  15,  the  performance  measure  S\  is  convex  in  the  densities 
/o,  f\  for  fixed  g.  This  implies  that  the  function  J(a\  fo,fi)  =  Si  [$’,(1  -  a)/o  +  a/0, 
(1  —  a)/*  +  a/i)  is  convex  in  a.  Furthermore,  a  necessary  and  sufficient  condition  for 
(4.13)  to  hold  is  that  4-  J(a;  /o,  /i)  >  0  for  arbitrary  /o,  f\.  However,  it  is  sijo  true 

lor=0 

from  considering  (4.14)  that  /J  and  /*  minimize  the  functional  T[/o,/i]  =  /(/o//i). 
Since  T  is  also  convex  in  /o,  A,  we  must  have  also  the  condition  that  ^T[(  1  -  «)/o*  + 
a/o,(l  -  a)/*  +  a/i]o_0  >  0.  By  considering  these  two  derivatives,  it  can  be  shown 
that  the  condition  that  /£,  /*  minimize  T  is  equivalent  to  the  condition  that  they 
minimize  Si(g*,fo,f\).  A  similar  proof  is  given  in  greater  detail  in  the  next  section  for 
the  performance  measure  S3. 


aui  +  (1  -  aQu2 
+  (1  —  a)t2 


4.3  Robustness  for  S3 


We  will  obtain  in  this  section  essentially  the  same  result  for  the  performance 
measure  S3  as  that  obtained  in  the  preceding  section  for  the  performance  measure  Si. 
The  first  task  is  to  show  that  the  problem  of  finding  the  least  favorable  distributions  again 
reduces  to  a  problem  involving  only  the  marginal  densities.  Define  a  new  performance 


measure 


a  ,  f  ,,  [M(g;/i)-Mg;/o)| 


(4.15) 


where  A  =  [(1  +  2R\)/(l  +  2J2o)].  Such  a  performance  measure  depends  only  on  the 
marginal  distributions  fo,  h-  If  Fo,  F\  are  process  distributions  which  have  marginal 
distributions  fo,  f\  and  bivariate  distributions  defined  by  (4.5),  then  we  have  the  relation 

f  /_  r*  n  \  fo.  A) 

S3(s’'Fi,’fl)  -  luW 

A  series  of  equalities  and  inequalities  similar  to  those  at  the  beginning  of  Section  4.2 
can  be  used  to  show  that  the  problem  (4.1)  for  the  performance  measure  So  reduces 
to  the  problem  involving  only  the  performance  measure  S3.  Thus  if  one  finds  the  least 
favorable  marginal  distributions  fo,  f{  and  the  minimax  robust  nonlinearity  g *  for  the 
performance  measure  S3,  then  the  minimax  problem  for  S3  is  solved  by  taking  the  least 
favorable  process  distributions  to  be  such  that  the  marginal  distributions  are  /q  ,  A’  and 
the  bivariate  distributions  are  given  by  (4.5). 

The  integral  equation  which  yields  the  optimal  nonlinearity  for  S3  is  similar  to 


(3.27)  with  m  =  0  except  for  the  coefficient  A: 


1  /o(x)  +  Afi(x)  J  L 


/o(*)/o(y)  +  Afi(x)fi(y)] 
/o(x)  +  Af\(x)  j 


g{y)dy-  (4.16) 


We  have  immediately  the  form  of  the  solution 


g(x)  = 


Bofo(x)  +  Bxh(x) 
fo{x)  +  Afi(x) 


(4.17) 


where  Bo  =  no  -  1  and  B\  =  An\  +  1.  If  we  consider  the  linear  system  of  equations 


=  B0  +  1  =  J  g(x)f0(x)dx 

A»i  =  ^(#i  “  !)  =  J g{x)f\{x)dx 


with  Bo,  B\  as  the  unknowns,  then  we  find  that  the  system  is  singular,  and  consequently 


we  may  assign  to  Bo  the  arbtrary  value  0.  This  implies  that 


=  [/ 


fo(x)fi(x) 
/o(x)+  Afi(x)' 


(4.18) 
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If  g  is  the  optimal  nonlinearity  which  is  matched  to  /o,  /i,  then  we  know  that 
Sz(g,fo,fi)  =  J g(x)[fi(x)  -  fa(x)\dx 

(4.19) 

■  *<*•■»/  jA)  1AM  - 

where  we  have  written  B\  as  a  function  of  /o  and  /i  to  remind  us  of  the  relation  (4.18). 
Lemma  13  applies  to  the  integral  in  (4.19)  with  'i(x)  =  x(x- l)/(x  +  l),  which  is  convex, 
so  that  the  integral  in  (4.19)  is  minimized  by  the  Huber-Strassen  densities.  Lemma  13 
also  applies  to  the  integral  in  (4.18).  In  this  case  9(x)  =  x/(Ax  4- 1),  which  is  concave,  so 
that  by  applying  the  lemma  to  the  negative  of  the  integral  (since  is  convex)  we  find 
that  this  integral  is  maximized  by  the  Huber-Strassen  densities.  f?i(/o,A)  therefore 
is  minimized.  Thus  our  candidates  for  the  least  favorable  marginal  densities  are  the 
Huber-Strassen  densities.  The  right  inequality  in  (4.1)  will  now  be  proved. 

The  following  inequalities,  which  depend  on  the  fact  that  <J2(g\f)  is  concave  in 
/  and  on  Lemma  15,  demonstrate  that  S3(5,/o,A)  is  convex  in  f0  and  f\  for  fixed  g. 
With  0  —  (1  —  a)  we  have 


SzigiPfo  +  «/o,/3/i  +  a/o)  = 


n(g;Pfi  +  a/i)  -  n(g-,Pfo  +  «/o)] 
<r2(g;0fo  +  a  fo)  +  A<r2(g]/3fi  +  afx) 


0{t*(g\  A )  -  m(s;  /o ) }  +  A )  -  m($;  /o ) } 

2 

-0\ 

[<r2(g’Jo)  +  A<J2(g\fx)]  +a[<r2(g;f0)  +  Aa2{g\h)} 

<  0S$(g,fo,fi)  +  <*Sz(g,fo,  f\) 


Define  the  function 


=  53[^’,(l  -  q)/o*  +  a/o,(l  -  a)/*  +  afx)  0  <  a  <  1 

where  /q  and  are  the  Huber-Strassen  least  favorable  densities  and  g‘  is  the  optimal 
nonlinearity  matched  to  /0* ,  /*.  Certainly  J  is  convex  in  a  if  S3  is  convex  in  /o  and  f\. 
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Now  the  right  inequality  in  (4.1)  holds  if  and  only  if 


>  «/(0;/o,/i)  (4.20) 

for  all  a  in  the  interval  [0, 1],  and  since  J  is  convex  in  a,  (4.20)  holds  if  and  only  if  we 
have  the  condition 

£j(a;fo,h)\^o>0.  (4.21) 

If  we  take  the  derivative  of  J(a;  /o,/i)  and  set  a  =  0.  Then  we  have 


=  2/s-(A  -  A  - /;  + /o') -/(sWo  +  4/0 +/(5')J(/o*  +  A/n 
+  2{/ }•/«■]  [  J s'Uo  -  /o’)]  +2 A[J g'K  1  [  J g-{h  -  /D] 

=  iBx{fg-h- /svr]  +  +  »  +  a/,). 

(4.22) 

We  can  now  show  that  (4.21)  holds  by  considering  the  function 

T[h’f']= !  id^mdx  (4-23) 

which  by  Lemma  13  is  minimized  by  the  Huber-Strassen  densities.  Define 
K(crJo,fx)  =  r[(l  -  a)/o  +  a/0,(l  -  a)/'  +  a*]. 


It  follows  from  Lemma  15  that  T  is  convex  in  /o,  f\  ■  By  the  same  reasoning  as  before, 
then,  we  conclude  that  /0* ,  /j*  minimize  T  if  and  only  if 


we 


The  final  step  in  our  proof  is  to  show  that  the  inequality  (4.24)  implies  the  in¬ 
equality  (4.21).  Define  p(a°  =  (1  -  a)/*  4-  a/<  for  t  =  0, 1  and  0  <  a  <  1,  so  that 
have 


K(a\h,h)  =  I 

J  Pa  + 


Pa'  +  Apa]  * 


(4.25) 


and  thus 


£*(a;/o,A)U„=U «/i 

The  derivative  of  the  integrand  in  (4.25)  is 

±[ 

da  L 


/ 1 

(piP)1 

(rf')!  i 

/  a 

[pL0)  +  a£> 

Po0)  +  APo0) . 

fa  +  APa 


. =  2Tshf;u'  - /n  +  (isTAn)  ’  [(/»' +  A/'»  + 

To  differentiate  K  we  must  justify  the  interchanging  of  the  integration  and  differentiation 
operations.  The  convexity  of  K  as  a  function  of  a  implies  the  inequalities 

27T+Af;Ul  -  /:)  +  + A,:)  -  (/° + Ah)] 


<1 

a 


(£]Y 


(. Po] )2 


P(a0)  +  Apl £>  + 

‘(lh2  (tf*)2 


(4.26) 


(plr)2 


Pi0)  +  AP\]  Po0>  +  4Po 


.(1) 


The  right  quantity  in  (4.26)  is  integrable,  and  the  middle  quantity  converges  pointwise 
monotonically  to  the  left  quantity  as  a  — ►  0  because  of  the  convexity  of  K.  The  mono¬ 
tone  convergence  theorem  then  permits  the  interchange  of  the  differentiation  and  the 
integration,  and  we  have 

L- 

/  {2/F&7r (/l  -/n  +  (w£uc) ’  [</°' +  A,:)  -  (/“ +  Af' 11 ) '  (4'271 
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Now  if  we  compare  equations  (4.22)  and  (4.27),  then  we  see  that  (compare  (4.17)) 


£-K°;/o,/>)L  =  £tfAW.,A)L 


and  thos  conditions  (4.21)  and  (4.24)  are  equivalent. 
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CHAPTER  5 


NUMERICAL  RESULTS  AND  CONCLUSION 


5.1  Description  of  the  Examples 

In  the  work  of  the  preceding  chapters,  we  attempted  to  justify  the  use  of  the  vari¬ 
ous  performance  measures  by  showing  that  a  large  value  of  a  performance  measure  results 
good  performance  as  determined  by  the  actual  error  probabilities.  Such  an  approach  is 
necessary  since  the  performance  measures  are  mathematically  tractable,  whereas  the  er¬ 
ror  probabilities  themselves  are  not.  The  error  probabilities,  however,  can  be  estimated 
by  simulation  on  a  digital  computer.  Such  simulation  results,  presented  in  this  chapter 
for  several  different  examples,  will  complete  this  work. 

There  are  two  questions  concerning  which  we  might  like  to  gain  some  insight  as  a 
result  of  these  computer  simulations.  First,  and  perhaps  foremost,  is  the  question  about 
the  validity  of  the  assumptions  which  were  made  in  justifying  the  various  performance 
measures.  In  particular,  we  assumed  that  *ue  distribution  of  the  test  statistic  was  ap¬ 
proximately  Gaussian,  and  in  fact,  under  the  hypotheses  of  Theorem  1  or  Theorem  12 
the  distribution  of  the  normalized  test  statistic  converges  to  a  Gaussian  distribution  as 
the  sample  size  approaches  infinity.  Our  tests  shall  have  finite  sample  sizes,  however, 
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and  therefore  the  effect  of  the  finite  sample  size  should  be  examined.  The  second  ques¬ 
tion  which  warrants  our  attention  is  of  a  philosophical  nature.  The  processes  between 
which  we  wish  to  discriminate  are  dependent,  and  thus  they  necessarily  involve  memory. 
However,  the  tests  with  which  we  wish  to  perform  such  discrimination  are  memoryless, 
and  therefore  it  is  not  clear  to  the  intuition  that  such  a  scheme  can  work  effectively.  In 
the  case  of  mixing  processes,  where  there  is  “asymptotic  independence,”  we  know  that 
memoryless  discrimination  is  possible,  and  that  the  performance  will  improve  aw  the 
sample  size  increases.  Our  concern  here  should  be  the  improvement  of  a  given  memo¬ 
ryless  discriminator  over  the  discriminator  which  is  designed  under  the  assumption  that 
the  processes  axe  iid  (i.e.  the  LRT  (1.2)). 

In  all  of  the  examples  which  are  presented  here,  the  marginal  densities  will  be 
the  same  throughout,  and  the  varying  parameters  will  be  the  time  constants  of  the 
dependency  lengths  and  the  sample  sizes  of  the  tests.  We  assume  that  the  process 
has  a  Rayleigh  distribution  with  parameter  0  =  4  under  hypothesis  H0  and  a  lognormal 
distribution  with  parameters  Ai  =  0.8,  Aj  =  0.25  under  hypothesis  Hi .  The  Rayleigh  and 
lognormal  densities  axe  given  in  Appendices  A  amd  B,  respectively.  These  distributions 
are  have  found  application  in  radax  discrimination  problems,  and  indeea  such  has  been 
the  motivation  behind  this  research.  The  n-dimensional  Rayleigh  density  is  such  that  it 
generally  lacks  a  closed  form  expression  when  n  >  2,  and  thus  an  LRT  is  not  feasible. 
The  parameters  pj  which  appear  in  the  expressions  for  both  the  Rayleigh  bivariate 
densities  (A2)  and  the  lognormal  bivariate  densities  (B4)  are  actually  the  correlation 
coefficients  of  the  underlying  Gaussian  process(es).  In  each  of  the  examples  of  this 
chapter,  we  shall  assume  that  the  values  of  the  pj  parameters  are  given  by  exponentially 
decaying  sequences  which  axe  determined  by  a  time  constant  r,.  Thus  under  hypothesis 


Hi,  we  have  pj  =  exp(-j/r«)i  where  pj  is  the  parameter  in  the  density  //.  The  time 
constants  will  be  varied  in  the  different  examples  to  reveal  the  effects  of  varying  degrees 
of  dependency  on  the  various  test  statistics. 

Several  comments  concerning  the  choices  for  the  aforementioned  parameters  are  in 
order.  First,  the  parameters  for  the  marginal  densities  were  chosen  to  match  as  closely  as 
possible  the  two  densities  involved.  More  precisely,  the  parameters  are  such  that  Eo  X\  = 
E1X1,  and  Eo  X*  =  Ei  Xf]  that  is,  the  first  and  second  moments  agree.  The  graphs 
of  the  two  marginal  densities  can  be  observed  in  Figures  1  and  2.  While  observing  the 
linear  plots  in  Figure  1,  it  seems  that  this  is  a  relatively  difficult  discrimination  problem; 
however,  logarithmic  plots  in  Figure  2  reveal  that  there  is  a  great  deal  of  discrimination 
capability  in  the  tail  regions,  the  Rayleigh  density  /0  having  a  much  heavier  tail  to  the 
left  and  the  lognormal  density  fx  having  a  heavier  tail  to  the  right.  Second,  by  taking 
the  sequences  of  ^-parameters  to  be  exponential,  sequences,  the  underlying  Gaussian 
processes  become  Markov  processes,  and  thus  methods  for  generating  the  processes  on 
a  computer  become  relatively  simple. 

As  mentioned  above,  the  parameters  for  the  marginal  densities  shall  remain  the 
same  for  each  of  the  specific  examples  considered.  The  time  constants,  however,  will  be 
varied  in  the  different  examples.  We  shall  assign  a  label  Ei  to  each  of  the  examples  for 
easy  reference.  Table  1  lists  the  parameters  for  each  of  the  examples. 


5.2  The  Calculation  of  the  Nonlinearities 

For  each  of  the  examples  we  shall  compare  the  performance  of  five 

different  nonlinearities  gi,i  =  0,...,4.  We  denote  by  gi,  for  0  <  i  <  3,  the  optimal 


62 


Table  1.  Parameters  for  the  examples. 


Example 

TO 

T7lo 

mi 

n 

Ei 

13.0288 

13.0288 

60 

60 

1000 

Ei 

13.0288 

130.288 

60 

600 

1000 

Ez 

130.288 

13.0288 

600 

60 

1000 

E< 

130.288 

130.288 

600 

600 

1000 

Es 

13.0288 

130.288 

60 

600 

100 

nonlinearity  for  the  performance  measure  Si,  which  is  consistent  with  our  usage  in  the 
preceding  chapters.  We  also  denote  by  g 4  the  optimal  iid  nonlinearity  given  by  gA  = 
l°g(/i//o)  (cf.  (1.2)).  Thus  we  have  also  five  different  test  statistics  Ti  =  gi(xk), 
i  =  0, . . . ,  4.  The  nonlinearity  g4  is  computed  easily  since  it  has  a  closed  form  solution. 
To  obtain  the  others,  the  corresponding  integral  equations  from  Chapters  2  and  3  must 
be  solved. 

Several  issues  must  be  considered  in  the  numerical  calculation  of  the  nonlinearities. 
First,  because  the  integral  equations  are  derived  for  m-dependent  processes,  we  must 
assign  a  value  to  m.  Our  criterion  for  doing  so  is  to  select  m  so  that  pm  <  pm in.  Thus 
we  have  two  values  m0,  m j  corresponding  to  the  processes  under  the  two  hypotheses  Ho, 
H\.  In  the  results  presented  here,  we  have  pm =  0.01.  These  results  were  tested  by 
decreasing  the  value  of  pmia ,  or  equivalently,  increasing  the  value  of  m,  and  it  was  found 
that  the  numerical  results  were  unchanged,  thus  corroborating  Theorems  6  and  11.  The 
second  issue  is  the  choice  of  a  finite  interval  [xmjn,xm«x]  over  which  the  integration  is  to 
be  performed.  This  amounts  to  truncating  the  densities,  and  is  of  a  special  concern  for 


the  nonlinearities  go  and  g\  since  fo{x)/f\(x)  is  unbounded  as  x  — *  oo  and  /i(x)//o(x) 
is  unbounded  as  x  — ►  0.  Thus  for  these  cases  the  absolute  term  in  the  integral  equation 
does  not  have  a  finite  second  moment,  and  it  is  therefore  manditory  that  the  tails  of 
the  densities  be  modified  or  truncated  for  the  problem  to  be  well  defined  as  a  Fredholm 
equation.  In  other  words,  the  condition  (a)  at  the  beginning  of  Section  2.3  is  not  satisfied 
unless  the  tails  of  the  densities  are  modified  by  truncation  or  some  other  method.  For 
the  problems  here,  the  interval  was  chosen  so  that  under  either  hypothesis 

P{X\  <  xmin}  <  c  and  P{X\  >  xm*x}  <  e,  where  e  =  5  x  10~5.  This  resulted  in 
x min  ~  0.02  and  Xm>,  ~  15. 7. 

The  most  direct  method  for  solving  a  Fredholm  equation  is  to  approximate  the 
integral  with  a  numerical  quadrature  formula,  and  thereby  transform  the  problem  into 
a  system  of  linear  equations.  In  the  method  used  here  the  quadrature  formula  was  a 
composite  Simpson’s  rule  with  N  =  301  nodes.  The  argument  goes  as  follows.  First 
approximate  the  integral  by  the  weighted  sum  thus: 

g(x)  ss  ;-T°— '  +  K(x,xi)g(xi)wi.  (5.1) 

We  find  our  numerical  solution  by  solving  the  N  linear  equations 


_  +  '22l{(xj,xi)uiwi  j  =  l,...,jV 


for  the  N  unknowns  u<,i  =  1,...,JV.  If  the  numerical  integration  in  (5.1)  is  reasonably 
a-’urate  for  all  the  x  values  in  the  interval  [xmin,xmax],  then  g{xi)  will  solve  a  linear 
system  of  equations  similar  to  those  in  (5.2)  but  with  slightly  perturbed  coefficients. 
Therefore,  provided  the  coefficient  matrix  is  not  ill-conditioned,  the  solution  of  (5.2) 
will  give  a  reasonably  approximate  solution  to  the  integral  equation.  In  fact,  Fredholm 
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actually  proved  that  the  such  approximations  converge  to  the  solution  as  N  approaches 
infinity.  Obtaining  a  numerical  solution  for  the  integral  equation  (3.15)  requires  a  little 
more  effort,  in  that  one  must  solve  the  linear  equations  for  several  different  values  of 
r  until  the  sequence  of  r  values  appears  to  be  near  its  limit.  In  the  problems  solved 
here,  the  initial  value  was  r  =  1  and  convergence  to  within  10~*  occurred  after  10-23 
iterations  for  the  four  examples.  For  any  of  the  integral  equations,  if  the  values  of  mo 
and  mi  are  large,  then  typically  the  vast  portion  of  CPU  time  is  spent  in  initializing 
the  matrix  for  the  linear  system  because  of  the  sums  in  the  kernels.  For  the  integral 
equation  (3.15),  where  the  linear  system  must  be  solved  several  times,  it  is  economical 
to  save  the  values  of  the  sums. 

Shown  in  Figures  3-6  are  the  graphs  of  the  numerically  computed  nonlinearities 
for  the  problem  Ei-  Figure  3  displays  the  nonlinearities  go  and  <?<,  while  Figure  4,  which 
is  drawn  to  a  much  smaller  scale  than  Figure  3,  displays  the  nonlinearities  gi,  gi,  and 
go-  Figures  5  and  6  are  semilogarithmic  plots  which  show  the  right  and  left  tails  of 
the  nonlinearities.  The  tail  behavior  is  of  concern  because,  as  we  noted  by  observing 
Figure  2,  this  is  where  most  of  the  discrimination  capability  lies.  We  might  try  to  predict 
the  performance  of  each  of  the  test  statistics  by  observing  the  shapes  of  the  corresponding 
nonlinearities.  For  go,  the  heavy  tail  to  the  right  will  cause  a  separation  of  the  means 
fio(go)  and  Hi(9o)  at  the  expense  of  making  cr{(go)  rather  large.  Of  course,  <75(50)  will 
not  be  effected  in  a  serious  way  by  the  right  tail  because,  as  can  be  seen  in  Figure  2,  /o 
places  little  mass  in  ♦’’at  region.  We  notice  the  reverse  situation  for  g\,  where  the  heavy 
tail  on  the  left  should  create  a  separation  of  no(9i)  and  H\{gi)  at  the  expense  of  making 
<7q(<7i)  large.  Because  g\  has  heavy  tails  on  both  the  left  and  the  right,  we  would  expect 
(*1(94)  ~  (*0(94)  to  be  large,  as  well  as  <75(54)  and  <7^(54).  Finally,  we  note  that  p2  and  53 
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Figure  4.  Linear  graphs  of  the  nonlinearities  g\,  gi ,  and  <73  for  Ex¬ 
ample  Ei- 
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*  jgure 


«^«nuumc  graphs 
*  =  0* . .  .,4  for  Example  E2. 


- - 
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Table  2.  Values  of  Hi  and  <j{  evaluated  at  the  each  of  the  nonlinearities  for 
Example  Ei- 


Ho 

<?o 

Hi 

<J\ 

9o 

-2.9643e— 01 

3.1015e+01 

9.6166e+02 

9.3037e+05 

9i 

-1.2197e+09 

1.3087e+ll 

9.1490e-06 

3.4924e+04 

92 

— 4.9429e— 03 

5.3269e-02 

5.6676e— 07 

1.7041e— 02 

92 

— 8.7036e— 03 

7.9951e— 02 

1.77Q9e— 06 

4.8095e— 02 

9 4 

—  1.6660e— 01 

1.4059e+00 

8.0028e— 02 

5.7433e+00 

do  not  have  heavy  tails  to  either  the  left  or  right,  and  thus  we  expect  pi  -fi0  to  be  small  as 
well  as  <Tq  and  a\  for  each  case.  Table  2,  which  lists  the  values  of  these  moments  for  each 
of  the  nonlinearities  for  example  E 2,  shows  that  such  predictions  are  accurate.  One  final 
comment  concerning  the  shapes  of  the  nonlinearities  is  worth  mentioning.  Because  of  the 
lopsided  nature  of  the  nonlinearities  <70  and  <71,  the  distributions  of  <70(Xi)  and  <7l(-^i) 
will  be  skewed  to  the  right  and  left,  respectively.  Thus  convergence  of  the  sums  To  and 
T\  to  a  Gaussian  distribution  will  be  slow.  On  the  other  hand,  because  the  nonlinearities 
<72  and  go  are  relatively  small  in  magnitude  and  “balanced,”  the  convergence  of  T2  and 
To  to  a  Gaussian  should  be  more  rapid.  The  heavy  tails  on  both  the  left  and  right  of  g4 
should  cause  gi(Xi)  to  be  skewed  to  the  left  under  Ho  and  skewed  to  the  right  under 
Hi.  Thus  convergence  of  to  a  Gaussian  distribution  should  also  be  rather  slow.  The 
same  general  phenomena  occur  for  the  other  examples  as  well. 

In  Tables  3-6  are  listed  the  values  of  the  performance  measures  evaluated  at  each 
of  the  nonlinearities.  We  observe  for  each  case  that  the  numerical  solutions  are  consistent 
with  the  goal  that  <7,  maximize  for  i  =  0, 1,2,3. 
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Table  3.  Values  of  the  performance  measures  evaluated  at  each  of  the  nonlin¬ 
earities  for  Example  E\. 


5o 

9.6196e+02 

1.0008e— 05 

1.0006e— 05 

1.0008e— 05 

5i 

8.6869e— 05 

7.0651e+10 

8.6869e-05 

8.6869e— 05 

52 

2.2580e— 02 

9.3601e— 02 

1.0155e— 02 

1.8191e— 02 

53 

3.0055e— 02 

5.4255e— 02 

9.8783e— 03 

1.9341e— 02 

53 

3.0775e— 02 

2.4026e— 02 

6.7720e— 03 

1.3493e— 02 

Table  4.  Values  of  the  performance  measures  evaluated  at  each  of  the  nonlin¬ 
earities  for  Examples  Ei  and  E%. 


Sq  S\  Si  S3 


5o 

9.6196e+02 

1.0690e— 06 

1.0690e— 06 

1.0690e— 06 

5i 

8.6869e— 05 

1.2197e+09 

8.6869e— 05 

8.6869e— 05 

52 

8.6121e— 03 

8.4155e— 02 

4.9434e— 03 

7.8126e— 03 

53 

1.1856e-02 

3.2762e— 02 

4.6221e— 03 

8.7054e— 03 

54 

3.0775e— 02 

1.8440e— 03 

1.1901e— 03 

1.7397e-03 
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Table  5.  Values  of  the  performance  measures  evaluated  at  each  of  the  nonlin- 
earities  for  Example  E$. 


So 

Si 

52 

53 

9o 

1.7417e+02 

4.9968e-06 

4.9951e-06 

4.9968e— 06 

9i 

8.2661e— 05 

7.0651e+l0 

8.2661e— 05 

8.2661e— 03 

92 

5.5079e— 03 

2.8253e— 02 

2.6506e— 03 

4.6093e— 03 

93 

6.5927e— 03 

1.8214e— 02 

2.5700e— 03 

4.8406e— 03 

93 

4.4120e— 03 

2.4026e— 02 

2.1620e— 03 

3.7275e— 03 

Table  6.  Values  of  the  performance  measures  evaluated  at  each  of  the  nonlin¬ 
earities  for  Example  E\. 


So 

Si 

52 

S3 

9o 

1.7417e+02 

8.9601e— 07 

8.9588e— 07 

8.9601e— 07 

9i 

8.2661e— 05 

1.2197e+9 

8.2661e— 05 

8.2661e— 05 

92 

1.8637e— 03 

6.0845e— 02 

1.3499e-03 

1.8083e— 03 

93 

3.0956e— 03 

8.4433e-03 

1.2010e-03 

2.2651e— 03 

93 

4.4120e-03 

1.8440e— 03 

6.8021e— 04 

1.3005e-03 

5.3  Simulation  Results 


Figures  7-11  contain  the  graphs  of  the  receiver  operating  characteristic  curves 
for  each  of  the  examples.  These  were  generated  by  tabulating  tnt  oi  j.0,000 

simulations  under  each  hypothesis.  Since  the  values  of  the  nonlinearities  were  computed 
for  only  those  values  in  the  interval  [xmia,  !,„**],  a  method  had  to  be  chosen  to  deal  with 
those  observations  outside  the  interval,  and  this  was  resolved  by  limiting  the  observations, 
so  that  a  value  outside  the  interval  was  reset  to  xmin  or  xm4x,  whichever  was  the  closest. 
The  reasoning  behind  such  a  method  is  that  real  world  observations  are  in  fact  limited 
since  the  instruments  which  make  the  measurements  are  limited.  The  endpoints  xmin 
and  zmu  were  selected  so  that  the  probability  of  an  observation  being  larger  than  xmax, 
for  example,  would  be  less  than  5  x  10-s,  and  the  actual  proportion  of  observations 
which  occured  beyond  xm%x  or  below  xmi„  in  the  simulations  proved  to  be  consistent 
with  this  probability.  Thus  the  effect  of  this  limiting  is  practically  negligible. 

The  ROCs  in  Figures  7-11  are  plots  of  the  error  probability  Pi  versus  the  error 
probability  P0  on  logarithmic  scales.  Our  main  concern  shall  be  the  minimax  point  of 
the  ROC,  or  that  point  where  Pq  =  Pi.  This  region  occurs  along  the  diagonal  which 
extends  from  the  lower  left  comer  to  the  upper  right  corner  of  the  graph  and  gives  us  an 
ordering  of  the  nonlinearities.  The  approximate  values  of  Pi  (and  hence  also  P0)  at  the 
minimax  point  are  listed  in  Table  7.  From  Figure  7,  which  corresponds  to  the  example 
Ei,  we  see  that  the  iid  nonlinearity  <74  performs  uniformly  better  than  the  others,  which 
is  to  be  expected  because  the  dependency  under  either  hypothesis  is  relatively  weak. 
The  ordering,  from  best  to  worst,  continues  with  <73,  go,  g\,  and  finally  go- 

Figure  8  corresponds  to  Ei,  were  there  is  a  relatively  strong  dependency  under 
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Table  7.  Approximate  values  of  Pq  (and  Pi)  at  the  minimax  region  of  the 
ROCs  for  each  of  the  nonlinearities  as  estimated  through  computer  simulation. 


- 

Example 

9o 

9i 

92 

92 

94 

Ex 

4.1e— 03 

2.2e— 03 

l.le— 03 

9.0e— 04 

7.0e— 04 

9 

Ei 

l.le-01 

2.6e— 03 

4.3e— 03 

6.5e— 03 

3.5e— 02 

Ei 

4.0e— 02 

2.8e— 02 

2.4e— 02 

2.8e-02 

2.3e— 02 

Ea 

1.8e— 01 

4.9e— 02 

4.4e-02 

6.9e— 02 

9.8e— 02 

Ez 

3.7e-01 

/ 

O 

H 

O 

1 

<v 

N 

iH 

2.4e— 01 

1.7e— 01 

the  hypothesis  H\.  The  ordering  is  0i,  02,  <73,  g4,  and  g0,  a  result  which  is  rather  pleasing 
to  the  intuition.  Since  there  is  memory  in  the  observations,  the  nonlinearity  g4  which  is 
designed  under  a  no  memory  assumption  performs  relatively  poorly.  The  nonlinearity  gx , 
however,  was  designed  to  minimize  o\,  which  essentially  captures  all  of  the  dependency 
under  H\.  Thus  g\  achieves  its  relatively  good  performance  by  minimizing  the  effects  of 
the  dependency,  and  this  may  perhaps  be  the  only  way  to  handle  dependency  when  a 
memoryless  discriminator  is  to  be  used. 

With  this  concept  in  mind,  we  now  examine  Figure  9,  corresponding  to  Ej  in 
which  the  dependency  under  Ho  is  relatively  strong.  Although  we  would  expect  a  reverse 
of  the  situation  of  E2,  we  find  again  that  the  ordering  at  the  minimax  point  is  g4,  g2, 
9z,  9\ ,  9o,  which  is  like  that  of  E\  except  that  the  positions  of  92  and  <73  are  reversed. 
There  is  not  a  lar^'*  difference  in  the  performance  from  best  to  worst,  and  in  fact  g\  and 
<73  are  actually  tied  for  the  third  position.  As  we  proceed  to  the  left  of  the  curves  from 
the  minimax  region,  we  find  that  there  is  a  region  where  01  performs  best,  and  finally  a 
region  where  go  performs  best.  If  we  examine  the  situation  a  little  more  carefully,  we  may 
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also  have  an  intuitively  pleasing  explanation  for  this  result.  The  correlation  with  which 
we  are  actually  dealing  is  that  of  the  underlying  Gaussian  process(es).  The  ^-parameters 
in  the  bivariate  densities  in  the  appendices  are  the  actual  correlation  coefficients  of  the 
Gaussian  processes,  but  are  related  to  the  correlation  coefficients  of  the  Rayleigh  and 
lognormal  processes  in  a  one-to-one  manner.  Denote  by  pr(p)  the  correlation  coefficient 
of  the  Rayleigh  density  as  a  function  of  the  parameter  p  from  the  bivariate  density  Let 
pt(p)  denote  this  function  for  the  lognormal  density.  Then  pi  has  the  explicit  form 


Pe(p)  - 


e -  1 
e —  1 


and  we  note  that  derivative  of  pt  at  0  is  positive.  In  fact,  for  A2  =  0.25  the  derivative  at  0 
has  the  approximate  value  0.88.  Although  there  is  no  closed  form  expression  for  pn,  one 
can  show  that  the  derivative  at  0  is  0.  The  implication  from  this  is  that  the  correlation 
of  the  Rayleigh  density  for  a  given  value  of  the  parameter  p  is  much  less  than  that  of 
the  lognormal  density.  This  makes  intuitive  sense  since  the  Rayleigh  process  involves  the 
sum  of  two  Gaussian  processes,  whereas  the  lognormal  process  involves  only  one.  Thus 
for  J&3,  there  is  an  increase  in  the  dependency  under  Ho  compared  to  that  for  Ei,  but 
this  increase  is  perhaps  not  so  significant  that  g0  would  perform  better  than  j4,  as  we 
might  expect.  What  we  observe,  however,  is  a  degradation  of  the  results  for  E\  due  to 
the  increase  in  the  dependency  under  Hq. 

We  have  the  ROCs  corresponding  to  E\  in  Figure  10,  where  a  nearly  uniform 
ordering  is  g\,  <72 >  9z,  9\i  9o •  This  ordering  is  precisely  that  of  E?,  although  there  is  not 
as  large  a  difference  in  performance  here.  This  is  the  situation  in  which  the  dependency 
under  both  hypotheses  has  been  increased  from  E\.  From  the  discussion  of  the  preceding 
paragraph,  we  know  that  the  dependency  under  H 1  is  stronger  than  that  under  Hq.  Thus 
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our  discussion  concerning  the  results  from  £2  apply  here,  and  we  may  interpret  this  as 
a  case  in  which  the  performance  from  £2  is  degraded  due  to  the  increased  dependency 
under  Ho- 

In  Figure  11  we  observe  the  results  for  £5,  where  we  hope  to  discern  the  change 
in  the  performance  from  £3  by  taking  a  smaller  sample  size.  We  find  here  that  the 
performance  of  each  nonlinearity  has  declined,  as  is  to  be  expected.  In  the  minimax 
region  the  ordering  in  essentially  the  same  as  that  in  £2,  although  there  is  a  small  region 
where  go  performs  best.  Clearly,  though,  the  performance  of  gt,  go,  and  go  are  practically 
the  same. 
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5.4  Conclusion 


In  conclusion,  we  may  wish  to  consider  again  the  questions  which  were  posed  in  the 
beginning  of  this  chapter.  First,  regarding  the  validity  of  the  performance  measures,  the 
simulation  results  and  the  values  of  the  performance  measures  do  not  necessarily  correlate 
well.  In  other  words,  knowing  the  values  of  a  particular  performance  measure  for  two 
given  nonlinearities,  we  may  not  be  able  to  predict  which  nonlinearity  will  perform  better 
in  the  simulations.  Indeed,  the  nonlinearity  <74  does  not  maximize  any  of  the  performance 
measures  5*  yet  it  has  shown  the  best  performance  in  the  example  E\.  This  does  not 
mean  that  the  performance  measures  are  not  useful.  On  the  contrary,  they  provide  us 
with  a  method  for  calculating  other  nonlinearities  which,  as  evidenced  here,  might  prove 
to  perform  better  than  the  iid  nonlinearity.  Nor  does  this  mean  that  the  theory  is  flawed, 
since  the  tests  used  here  required  finite  sample  sizes  y  an  aspect  which  was  neglected  in 
the  theory.  One  consequence  of  the  finite  sample  size  is  that  the  distributions  of  the  test 
statistics  are  not  truly  Gaussian.  In  fact,  the  distributions  of  the  test  statistics  To  and 
T\  are  strongly  skewed.  Although  this  might  be  undesirable  from  the  standpoint  of  the 
theory  alone,  this  phenomenon  is  not  necessarily  undesirable  in  practice.  Consider  the 
test  statistic  T\ ,  for  example.  The  variance  of  T\  under  the  hypothesis  Ho  is  extremely 
large  compared  to  the  variance  under  H 1  and  the  compared  to  the  difference  between 
the  means  under  the  two  hypotheses.  However,  because  the  distribution  of  T\  is  skewed 
to  the  left,  most  of  the  outliers  under  Ho  fall  away  from  the  threshold,  and  this  results 
in  a  generally  good  performance. 

Second,  concerning  the  performance  of  memoryless  discriminators  for  dependent 
processes,  is  has  been  demonstrated  that  in  a  situation  of  weak  dependency,  the  iid 
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discriminator  can  be  difficult  to  improve  upon,  while  for  a  situation  of  strong  dependency, 
the  methods  derived  here  show  definite  improvement  over  the  iid  discriminator.  We  might 
now  conjecture  as  to  how  such  an  improvement  might  come  about.  If  the  dependency  is 
strong  under  one  of  the  hypotheses,  say  H\,  then  the  result  of  maximizing  Si  will  likely 
lead  to  an  improvement.  This  is  because  maximizing  Si  will  result  in  a  small  value  of 
<t\ ,  the  effect  of  which  is  to  minimize  the  much  stronger  dependency  condition  under  H\. 
This  conjecture  seems  to  be  corroborated  by  the  simulation  results  from  £2 ,  £4,  and 
Es  where  the  dependency  under  H 1  is  stronger  than  that  under  H0.  If,  however,  there 
is  strong  dependency  under  both  of  the  hypotheses,  then  perhaps  the  best  approach 
would  be  to  maximize  S2  or  S3,  since  in  this  way  both  of  the  dependency  conditions 
are  minimized.  The  relevant  example  here  is  £4,  where  g\  still  performs  best.  However, 
as  we  noted  above,  the  dependency  is  still  somewhat  stronger  under  H 1  than  under 
Hq.  The  basic  premise  of  the  method  described  here  is  that  when  using  memoryless 
discriminators  for  dependent  processes,  the  best  one  can  do  to  deal  with  the  dependency 
conditions  is  to  minimize  their  effects.  The  performance  measures  Si  provide  framework 
for  doing  so. 

It  is  satisfying  to  observe  in  the  simulation  results  that  p2  and  <73  perform  com¬ 
parably,  though  p2  generally  performs  slightly  better.  This  is  important  because  the 
calculation  of  <73  is  much  easier  than  the  calculation  of  p2.  Considering  the  good  overall 
performance  of  the  nonlinearity  <73,  especially  when  there  is  strong  dependency  under 
both  hypotheses,  and  the  relatively  simple  calculation  needed  to  determine  it,  this  non¬ 
linearity  might  be  preferred  in  most  situations. 

The  theory  which  has  been  presented  here  is  entirely  based  on  central  limit  theory 
and  the  assumption  of  large  sample  sizes.  Indeed,  all  of  the  performance  measures  derived 
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here  have  been  asymptotic  performance  measures.  Admittedly,  this  approach  might 
seem  to  be  too  narrow  in  its  application  to  be  useful  in  situations  where  a  relatively 
moderate  or  small  sarnie  size  is  required.  However,  when  faced  with  the  problem  of 
discriminating  between  two  possible  sources  for  an  observed  random  process  in  which 
there  is  correlation  in  the  observations,  there  is  little  else  that  one  can  do  other  than  the 
LRT.  Therefore,  the  work  here  is  significant  in  that  it  does  present  an  alternative  to  the 
LRT:  the  memoryless  decision  rule.  In  situations  where  the  LRT  cannot  be  implemented 
and  there  is  decorrelation  of  the  observations  with  time,  one  might  be  tempted  to  assume 
that  the  observations  are  iid,  thereby  leading  to  a  memoryless  discriminator.  We  have 
seen  here  several  alternative  memoryless  discriminators,  some  of  which  might  improve 
upon  the  iid  discriminator.  Furthermore,  the  simulation  results  have  demonstrated  that 
good  results  can  be  obtained  for  even  moderate  sample  sizes.  One  possible  area  for 
future  research  might  be  to  examine  more  thoroughly  the  performance  for  various  sample 
sizes,  and  we  might  also  note  that  these  memoryless  discriminators  are  ideally  suited  for 
sequential  discrimination.  We  have  also  seen  how  some  of  the  results  can  be  made 
robust.  Robust  discrimination  is  important  in  many  applications  where  circumstances 
might  vary  from  test  to  test,  such  as  a  situation  of  radar  discrimination  of  targets.  Thus 
these  robustness  results  are  also  significant,  and  the  application  of  these  results  to  radar 
problems  might  also  be  an  area  for  future  research. 
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APPENDIX  A 


The  Rayleigh  Distribution 


Let  X  =  (X\,. . .  ,Xn)  and  Y  =  (Yi,...,Y’n)  be  independent  and  identically 
distributed  Gaussian  random  vectors  such  that  E  Xi  =  E  Yi  =  0  and  Var  Xi  =  Var  F,  =  0, 
for  t  =  l,...,n.  Suppose  also  that  E  X<Xj/ ^/9i0:  =  EYiYj/ yjOiO]  =  pij.  Then  the 
random  vector  Z  =  (Z\, . . . ,  Zn)  defined  by  Z{  =  i/X2  +  Ft2  has  a  Rayleigh  distribution. 
For  n  >  2,  the  n-dimensional  density  involves  n  -  1  iterated  integrations  and  does  not 
have  a  dosed  form  expression.  The  bivariate  density  for  (Z,-,  Z:)  is 

...  UV  f  1  [U2  V21'|  r  T  PijUV 

/*.*,(«,«)- p{  2(l-p?f)L«>i  +  «iJj  Tfl-^vW 

(u  >  0,  u  >  0)  (/U) 

where  Iq  is  the  modified  Bessel  function  of  the  first  kind  of  order  0.  If  the  vectors  X  and 
Y  are  stationary,  then  Z  is  stationary  and  the  density  for  (Zi,  ZJ+l)  can  be  written 

uv  [  u2  +  v2  f  pjuv 

fzai (*» v)  -  _  p2)#2  exP [“ 2(1  -  p))d\  r°  [{l-p))e\ ' 

(u  >  0,  v  >  0)  (A2) 


where  9  =  9 j  =  0j+ 1  and  pj  =  p\,j+\ .  The  marginal  density  takes  the  form 


/(«)  =  |  exp  (-21),  (.  >  0). 


The  moments  of  the  Rayleigh  random  variable  Z  are  given  by 


EZ"  =  (20)n/2r(j  +  l)  (n  =  1,2,...) 


where  T  denotes  the  gamma  function. 
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APPENDIX  B 


The  Lognormal  Distribution 


Let  X  =  (Xi,...,Xn)  be  a  Gaussian  random  vector  with  EX,  =  A^  and 
VarXj  =  Suppose  also  that  E XtXj/ =  EY,Y3/ yj8,Q3  =  ptJ.  Then  the 
random  vector  Z  =  (Zi, . . . ,  Zn)  defined  by  Z{  —  exp(Xj)  has  a  lognormal  distribution. 
If  Fx  and  Fz  denote  the  n-dimensional  distribution  functions  for  the  Gaussian  random 
variables  and  lognormal  random  variables,  respectively,  then  we  have  the  relation 


Fz(zi,...,z„)  =  F x^log  Zi, . . . ,  log  zn). 


(B 1) 


We  may  therefore  obtain  a  relation  for  the  densities  by  differentiating,  and  this  yields 


fz(z\ , .  •  •  *zn)  — 


Z\  •  •  •  *» 


/x(logZl,...,logZn) 


(B2) 


The  explicit  expression  for  the  bivariate  density  for  (Zi,  Z:)  is 

-i 


fzizi(u,v)=  2TruvyJ\^)\{2i)(l  -p^)  x 


exp{  2(1  -A,) 


[Qogtt-  *(i°)2  _  2ptj(l°g  «  ~  A^KIogo  -  \\J>)  (logti  -  \[J>) 


(*)> 


iO')i 


+ 


iU) 


]} 


(B  3) 


If  the  random  vector  X  is  stationary,  then  so  is  the  vector  Z,  and  the  density  for 
(Zi  ,  Zj+i )  can  be  written 


f}( u,v)=  2xuv\iyJ(l  -  p2j) 


-l 


exp 


f  (log  u  -  Ai )2  -  2pj(log  u  -  Ai )(log v  -  Ai )  +  (log v  -  At  )2  \ 
l  2Aj(l  —  p])  J 


(54) 
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where  A2  =  AjX)  =  AjJ+1*  and  pj  =  p\,j+\.  The  marginal  density  has  the  form 


gt  \  1  _  f  (log  u  -  A!  )2  'I 

/(u)"  2A2  )• 


(B  5) 


The  moments  of  the  lognormal  random  variable  Z  are  given  by 


E  Zn  =  expfnAi  +  A2]  (n  =  l,2, ...). 


(£6) 
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